w 

pi 

1 





• P.HCltt • 

i OIHcc i' 



INVHSTOR IN PKOPIJ: 

The Patent Office 
Concept House 
Cardiff Road 
Newport 
South Wales 
NPK) 8QQ 



, the undersigned, being an officer duly authorised in accordance with Section 74(1) and (4) 
LV the Deregulation & Contracting Out Act 1994, to sign and issue certificates on behalf of the 
Comptroller-General, hereby certify that annexed hereto is a true copy of the documents as 
Viginally filed in connection with the patent application identified therein. 



■accordance with the Patents (Companies Re-registration) Rules 1982, if a company named 
Ihis certificate and any accompanying documents has re-registered under the Companies Act 
lo with the same name as that with which it was registered immediately before re- 
Istration save for the substitution as, or inclusion as, the last part of the name of the words 
T)lic limited company" or their equivalents in Welsh, references to the name of the company 
lis certificate and any accompanying documents shall be treated as references to the name 
1 which it is so re-registered. 



tordance with the rules, the words "public limited company" may be replaced by p. 
l.L.C. or PLC. 



x., 



listration under the Companies Act does not constitute a new legal entity but merely 
; the company to certain additional company law rules. 




Signed 
Dated 17 March 2004 



CERTIFIED COPY OF 
PRIORITY DOCUMENT 



An Executive Agency of the Department of Trade and Industry 



BEST AVAILABLE COPY 



THIS PAGE BLANK (uspto> 



Patents Form lffTUdOdmn 



Request for grant of a patent 3 



The Patent Office 

Cardiff Road 
Newport 
NP9 1RH 



1. Your Reference. IMR/CEE/Y1402 16JUL03 E822773-2 



Application number 0316532.1 



3. Full name, address and postcode 
of the or each Applicant 

Country/state of incorporation 
(if applicable) 


Transitive Limited 
3tn Jripor Aiuer as tie 
10 Noble Street 
London 
EC2V 7QJ 


o^fofc^gLW cot 


Incorporated in: United Kingdom 


4. Title of the invention 


Method and Apparatus for Partitioning Code in 
Program Code Conversion 


5. Name of agent 


APPLEYARD LEES 


Address for service in the UK to 
which all correspondence should 
be sent 


15 CLARE ROAD 
HALIFAX 
HX1 2HY 


Patents ADP number 


190001 ^ 



6. Priority claimed to: Country Application number Date of filing 



7. Divisional status claimed from: Number of parent application Date of filing 



8. Is a statement of inventorship and YES 
of right to grant a patent required in 
support of this application? 



Patents Form 1/77 



Page 212 



9. Enter the number of sheets for any 
of the following items you are 
filing with this form Do not count 
copies of the same document 

Continuation sheets of this form 

Description 

Claim(s) 

~ "Abstract : . 

Drawing(s) 



< 



84 
17 
1 

ii Hrl\ 




10. If you are also filing any of the 
following, state how many against 
each item 

Priority documents 

Translation of priority documents 

Statement of inventorship and 
right to grant a patent (PF 7/77) 

Request for a preliminary 
examination and search (PF 9/77) 

Request for substantive 
examination (PF 10/77) 

Any other documents 
(please specify) 



11. 



We request the grant of a patent on the basis of this application. 
Signature Date 



APPLEYARD LEES 



14 July 2003 




12. Contact 



Ian Robinson- 01422 330110 



1 



10 



15 



20 



25 



30 



METHOD AND APPARATUS FOR PARTITIONING 
CODE IN PROGRAM CODE CONVERSION 



The subject invention relates generally to the field 
of computers and computer software and, more particularly 
to program code conversion methods and apparatus useful' 
for example, i n code translators, emulators and 
accelerators. 

In both embedded and non-embedded CPU's, one finds 
predominant Instruction Set Architectures (ISAs, for which 
large bodies of software exist that could be "accelerated" 
for performance, or -translated" to a myriad of capable 
processors that could present better cost/performance 
benefits, provided that they could transparently access 
the relevant software. One also finds dominant CPU 
architectures that are locked in time to their ISA , and 
cannot evolve in performance or market reach. Such 
architectures would benefit from -Synthetic CPU" co- 
architecture . 

Program code conversion methods and apparatus 
facilitate such acceleration, transition and co- 
architecture capabilities and are addressed, for example, 
xn WO 99/03168 entitled Program Code Conversion. 

During program code conversion of a subject program 

a subiect architectu ~ to a — — 

executable by a target architecture, a problem arises with 
respect to cods that is self -modifying. -Self -modifying 
code" refers to a subject program that intentionally 
-difies its own subject code. There are several reasons' 
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»hy a pro g ram might modi(y ±t> ^ 

examples of self-modifying code are ^ ^ ^ 



Code 



Overlays 



Trampolines 



Code Patching 



Run-Time Compilers 



Code Function 

To save address space, a single pr0 cess can re-use a subject address range to 
hold Afferent libraries at different times. Such uses may or may not be 
associated with system calls to mmapO and munmap() 

^^^ea^ call to code elsewhere in the system 



j . ~ ^ — ojfotCIil. 

or breakpoint operations. 



Signal Handler 



compilers. Such programs potentially write many fragments of subject code all 
over the data area. 



caused the exception and continue. 



Table 1: Examples of Self -Modifying Code 
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cod °7 T Pr ° blemS ^ self-modifying 

code to dynamic translators is that is that the k 

code that „as modified may correspond to ^t ^Iln 

already heen translated. Whe n such a modification o 
suh jec ? eCt d COdS ™' a " — "ions or the modifie 

Thus, the translator must he a ble to identify all tar g et 
code se.uences (i.e.. translations., that correspond to 
partner. subj ect code addresses hein g modified ^ 

t::v:::z:t: flndins and deieti - *- — — 

corresponds to a gxven subject address i s difficult 
and sometimes not even possible m dlff ^ult 

:::;:r: tlons - — — — — 

° f SUbjS « Besses that the translations 
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represent. m these situations if ^ 
-difies its own code at certain h ^ 
translator has no ^ the 

tr , n ° Way t0 identify which respective 

translated target code to invalidate. " 

an a™ 1 ! V " ^ -vided 

apparatus and method as set », • 

da ims . Preferred £eature ; ; f et j° rth in the 

a ppa« nt from the dependent o ; f ai - ~- wlll bs 

10 which follows. e descri Ption 

The following is a summary of various aspects and 
advantages realizable accordina to • 
according to the invention It " 

« — uction to assist those skill d ? n ^« to" " 
rapidly assimilate the detailed H ■ ^ 
ensues and does not and ^ ^ 

limit f h 13 intende d in any way to 

-Limit the scope of the ri 3 i mo 

the claxms that are appended hereto. 

code conversion MT+ • dlrSCted *t expediting program 

diversion, Particularly useful in 
run- time translator whic ; US ; m fUl " -1th a 

success baaic hiocxs of h '""^i- of 

taro et ^ u subject program code into 

target code wherein the target code „„ 

«~t basic bioc* is execu ed 1^ 8 
target code for the next baS ic blo L 9Sneratl0n ° ? 

part;:io:i;;r;;r; U e p To e :: V irth translator — * 

into regions, referred to here a fter a ^ ^ 

each petition contains a -i^J"J"£H' *™ 
code and corresponding target ^ r ' ^ 
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m this manner, when fho „ w 
Program modifies subject code / ^ 

actually affected H u 7 ^titions 

-scarded and n t \ ^^—^ — need be 
can be ke pt Th - S tranSlatl ° nS in Effected partitions 

in limitTn, Je i- advantageous 

mi ting the amount of target code th*t- 
retran9lat-^ • coae that must be 

- ™; ;:rr s t ;r — — — — . 

modifyina *- at als ° evolve self- 

7 9 C ° de to Perform code modification in 
safe manner. "cation m a thread- 

The accompanying drawings, which „.= • 
and constitute a part of th incorporated i n 

Presently preferred , deification, illustrate 

follows- -Pl-entations and are descrihed as 
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25 (intermediate ^ ' corresponding IR 

aiate representation) generated a ■ 
process; generated during the 



Figure 3 is a schematic illustrat-, 
-ata structure and cache according to ' & ^ ^ 
30 e^odiment of the invention; " 

b asic F to r c e k ; r :; e ; s; fiow dia9ram ^ extended 
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^ure s i. . flow diagram illustrating isoblocking; 

Figure 7 is a schematic diaqram ^f" , 
illuqhrafin« ^-Lcigram of an example 

illustrating group block optimization; 

10 Figure 8 is a flow diaor,™ -n 

— „„, including ; i; llu ; trati - 
-o W oc ki „ 9 , and group b L kin ; — 

Figure 9 i s a schematic diaoram 
15 illustraHna diagram of an example 
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Fxgure n is a schemat±c d - 

illustrating control flow bet 9 ° f " eXample 

translator of the „ Partitions by the 

r 0t the Present invention. 

f e J r l U T tiVe ^""^ ^l-„ tins va rious noV£l 

J-eatures discussed below 1 <= ,^ • 

clow 1S shown m Fiaurp i c • 
illustrates a taro P f 9 Fl ^e 1 

d target processor n ^ nol •, . 
registers is t-™ -v. including target 

gisters 15 together with memory i8 st0 Hna 
software storing a number of 

cware components 19, 20 9 1 

« includln9 a bas ; c ^rirr 9 Tr 9 

reqisfpr Cdcne 23 / a global 

yiSLer store 27 fln H , . 

translated The sof , C ° de 17 *° *» 

The so ft „ are co mpo nen t s include an operating 



6 



25 



30 



system 20, the f-T-a««i 

cne translator code 19 =»r^ *- 
The fr.nd 4. translated code 21 

-Lne translator cod^ i q ^ - ^- L - 

coae 19 may function, for examnU 
emulator tranalaM'nn ^ example, as an 

translating subject code 

translated code of an ^ ISA ±nt ° 

Luae another ISA r*r- 

■ — lating sub3ect code into r; sl - 

same ISA. oae ' each of the 

The translator 19, i e 

> -L-e., the compiled verqinn ^ _u 
source code implemen J V « Si °" °* the 

" translated code 21, i . ° f tr * ,Ml - t «. «* the 

i.e., the translation n f t->-^ i 
code 17 produced by the tran-n-- subject 

cne translator 19 -run 
with the oneraur, conjunction 
operating system 2 0 such a« * 

running on the ta , r eXam ? le ' UNIX 

n the target processor 13 tvn , nall 
microprocessor or other suitable typically a 

r . suitable computer. it w-i n u 

= appreciated that the structure illustrated • . 

exemplary only and that fo ""^"^ ln Fl 9"« 1 Is 

and processes a , ^ eXample ' «*t«„. me thods 

^xucesses according to t-h« 
implemented in code ., invention may be 

operatin, system Th J ^ ~ " - 

operatin 9 system and t C ° de ' tranSlat ° r code , 

w«e .arlety IT meChatUSmS ^ * ^ ^ « 

jt <-ypes, as known to those on n j . 

art. nose skilled m the 

In apparatus according to Figure , _ 
• conversion is preferahl , Program code 

preferably performed dynamically 
time, while the tr, na i - , ynamically, at run _ 

L.xie translated code 01 
translator 19 runs inn ■ running. The 

■ The Wlth the translated prooram pi 

The execution path of the tran.i ■ Program 21. 

-op comprise t he steps T ^ * ^ ol 

"Mch translates a J of 

translated code 21 . and ^ ^ 17 ^to 

translated code- the Z f M «* ° f 

stains instrJcttn; to rr rn blOCk ^ — 

return control back to the 
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translator code 19 T „ 

In ° ther ^rds, the steps of 

;i;r a t then «- -j a ; 
" " tr r that oniy portions ° f tha sut -« — 

s Lt V " 3 ^ ^ ™«*d code „ t a 

IT iS - «- "ansiation of 

subsequent basic blocks T> , e f , 

unit of * n translator's fundamental 

unit of translation is the basin m 1, 

translator 19 translates t h meaninS ^ 

block at a t ■ SlatSS SUb ^ «<ta 17 one basic 

1. section , " baSi ° WOCk ^ as a 

one e^t °"" ^ - 

exrt pent, „ hich ii.it. the blook OQde J 

control path Po -r single 

thlS reason ' basic blocks are the 
fundamental unit of control flow. 

15 In the process of qeneraHnr, «-u 

Jt generating the translated code 21 

intermediate representation (-jr., trees 

20 translated code 21 i, ' Program. Later , 

code 21 is generated based on the IR trees. 

The collections of IR nodes described he 
colloquially referred to as "trees" w 
formally, such structures _ ^ ' * ~t- that, 

graphs (DAGs) , not trees _\ " ^ dlreCted «*°"c 

trees. The formal definition of a trw 
requires that each node have at- m« - 

the Pmh „■ ° ne P arent - Because 

cne embodiments descrihoH 

elimination „ • ' COmm0n -Expression 

examination during ir aenpr^i^ 

multiple parents For * "* S wi " «"«=«» have 

0 instruction result Z T ^ " °' " "^"-^ 

registers th """^ '° by «=•» *«r.ct 

re 9 ter Ind th Se r"™^ - «- destination subset 
egister and the flag result parameter. 
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For example, the subject instruction -adrf • , - 
•r3- performs the addition of the c T ' 

registers % r2 and %r3 and °* C ° ntents °* «*J-ct 

«ua „rj and stores the rpc,„H , , . 

register %rl . Thus t-hi Q • subject 

- -Luc- ~::;;::::r ;rr r — 

Object program 17 ^ he conte ^ of a 

details of t h P r* they ma ^ represent 

constant value! ^ruction such as immediate 

When the "arM" 

add instruction is parsed 

« -e is generated , corresponding pars ; o d ' ; - » 

mathematical operator for addition 9 t T ° he * „ 
stores references to other tb „ n ° de 
operands (represented in the IE T 
-en heid in subset regies K ^77^ 
-erenced by the subset register » tae ^"a.""" 
<the abstract register for %rl t „ " 
destination register) » ' instruction's 

-tion of F J ra i: 3hoJ: h ™; e * 
the xas instruction -add J,. ^ C °"™^ to 

As those skilled in the art ™ 
embodiment the translat appreciate, ln one 

c translator 19 to h-^t . ... . 

object-oriented Dm ■ implemented using an 

rented programming language such as c 
-ample, an „ node is lraplemented . ^ " F ~ 

references to other „ odes are " ° ++ ■ ™« 

references to the „■ implemented as C++ 

nodes. ta IR ; ° bJSCtS —Ponding to those other 

tree is therefore implemented as a 
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collection 
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° f IR n ° de ° bjects ' containing various 
references to each other. 

■ sen™" ^ U " d " Hussion, IR 

generation uses a set of ah^r^h 

abstract registers. These 

:i s ir re r ers correspond c ° — - 

2 ;;i t archi — • - there is . unique 

rt.tr.et regrster for each physical register 

2l ct a - hi — <-~*~ register „ K similar y h 
ol fl ; a unique abstract — ^ '« — . con ditio ; 

: s er : PreSent ° n ^ "* 3 ~ t architecture. 
9EterS SS Placeholders for IH trees during JR 

generation. For tu "ring ir 

%r2 a , . sample, the value of subject 

«» at a grven point in the subject instruction sequence 
s represented by a particular » expression tree _ ^ 
« associated „ ith the abstract register for subject 
reg-ter %ra . In one e ^ odiment , an ^ 

Rented as a c ++ object, which is associated with a 
particular ir tree via a c ++ r-^ 

ob,^, - u ° ++ reference t° the root node 

object of that tree. 



the \ instruction sequence described above 

the translator has already generated IR trees 

corresponding to the values of %r2 and „ • , 

2= the subject ,•„„, hlle Parsing 

subject instructs chat precede 

al uTI :° n th " ° th " W ° rdS ' -at 

- rZizij::-^ v r2 - « a re alrMdy 

the "add • , Senerating the IR tree for 

30 co t ' %r3 " inSt ^"°n- -e new . + . node 

concerns references to the » subtrees for %r2 and % r3 

The implementation of the a h«t-« - 

. abstract reqistert; -i o 

-vrded between exponents in both the transistor code X, 
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and the translated code 21. Within t-h« - 

"abstract register" is , ! ^ an 

IR S 3 P laceh °lder used in the course of 

IR ^ration, , such that abstrart . 

associated with the IP * u abStract register is 

^ wnicn tne partirnlav 
reaicjfQv ^i-iouiar abstract 

register corresponds. As such ^Ho- 

tr3no ,, SUCh ' abs tract registers in the 

translator may be implemented as a C++ k- 
contain, =, as a C++ object which 

contains a reference to an IR w 

tree,. The aggregate of all l R ^" f ^ 

forest ("forest" h*™ • working IR 

register root T " 

gxster roots, each of which refers to an IR tree) The 

working IR forest Cree) • The 

represents a snapshot of the s h=fr^, 

.. — — - • — - .tr:: 

Within the translated code 21 «»k . 

. coae an "abstract register" 

o iT lfi ;. location witun the — t - 3 t oi, 

value ha s b tar9et rS9lSterS - ^el y , „hen a 

2:ir:;;L rr r the 9iobai resist - — • - 

' ' " ''-^ -de, 2l could be 
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understood to be a taraet r • C ° Uld be 

holds a subject register 3 WhiCh 
translated code 7l ^ ^ ™^ ^ the 



" Ration of ^ ^ r r L: h 

the corresponding „ trees that are generated in h 

process of translation. The left side ""^ " 

the ex e out ion path of the ^ 2 ^ 

P th of the translator 19 during 
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translation. in st.Pn i +-u 

fi „, . . P 151 ' the translator 19 translates a 

first basic block 153 of subject code ,■„„„ „ 

and then, in steo 155 ^ C ° dS 21 

the taraet I C ° de 
the targ t code „ f inishes executlon _ ^ 

translat h ' ^ ^ -anslator 

into baSi ° WOCk 159 ° £ S ^ect code l 7 

si iT ; ode 21 and chen ™ es that — — ». 

seep 161, and so on. 

■0 in the course of translating the first basic block 153 

of subject code into target code. the translator l 
,ene t es an IR ^ ^ ^ ^ ^ ^ 1 

in this case, the IR tree 1 63 is generated fro m the source 
ns rue ion -add te , ^ » „hich is a flag-effecting 
instruction, m the course of generating the IR tree i 63 
four abstract registers are defined by this instruction-' 
he destination abstract register Secx ls7 . the firs^ 
flag-affecting instruction para.eter 1M . the second flag- 
affecting instruction para m eter l 71 , and the fl ^_ 
affecting instruction result 173 . The IR J 

corresponding to the "add" instruct™ • 

, . instruction is a »+•' operator 

U.e., arithmetic addition ,ru 

addition), whose operands are the 

subject registers %ecx 1 77 and %edx 1 79 . 

Thus, eolation of the first basic block 153 puts the 

r suit of' St " e ^ St ° rin9 P ™- «" 

result of the flag-affecting instruction. The flag- 
affecting instruction is -add % ecx. Sedx , The parameter 9 s 

sleet lnStrUCti ° n «" th£ <™t values of existed 
subject registers te 1 77 and % edx l79 . The „ @ „ ^ 

Preceding the subject register uses 1 77 , 179 indioate C 
the lues f the subject registers _ f ^ 

Slobal register store, frc ra the locations corresponding to 
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%ecx and %edx, respectively as th^. 

registers were not Drev ' ^ P«txcular subject 

Previously loaded by the currpnf k • 
block. These current basic 

xnese parameter values ar-o t-v, 

first and second ^ St ° red ±n the 

second flag parameter abstrs^ 
5 "1. The result of the additi registers 169 . 

in the flag result ah operation 175 is stored 

ag result abstract register 173. 

After the ir t rp P So 

target code 21 is """Ponding 
code 21 is generated based on the IR Th„ 

■ «... ». ™; : - "«'«« •«» 

yec cocle is generated, it iq f-v^ 
step 155. ' C ls then executed, 

Figure 2 shows an example of translation „ 
interlaced. The translator 19 f irst eXeCUti ° n 
code 21 ba 8 -H u generates translated 

based on the subject instructions 17 of a fir- - 
basic block- i « «_u a flr st 

the translated code 21 ret ^ 153 ' 

Luae ^l returns confrni ^ 

19, which then translate. translator 

translates a second basic block 15 g 
translated code 21 for hh« The 

ror the second basic blorlc ui • ^ 
executed At- H ho j mock 16i is then 

moo, lS9 th r t ; the ™ ion ° f che 

translator 19 Wh W " ^ C ° 
and so f or th »» ™ ^slo Mocl , 

» -i: ~ r:r r in9 under the 

—lea.ed m anner ^ " ^ ™ *» - 

tne translator code 19 and the 
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Z l to V 21 The translator code 19 is 

source Pri ° r " rUn -" me - b ~* °» «» ^"level 

source code .mplementation of the translator „. The 

translated code 21 is generated by che ^ 

5 Chr ° U9hOUt ^ on the subject code 17 of the 

program being translated. 

The representation of the 
i u • subject processor state is 

fcewrse ivided betw _ ^^^^^ ^ ^ « 

lanouaoa H of "WWcit programming 

language devrces such as variables and/or objects- the 

::r:;r d to . corapiie the tr — — - - ::: 

« t lllateVTdl^ ar V mPlementedint ----- - 

21 ' ^ comparison, stores subject 
processor state implicitiv in target registers and meLrv 
loca tlons , „ hlch are manipulated ^ y 

instructions of the translated code 21. 

10 aljT eXamPlS ' l0W " leVel -Presentation of the 

S loba! register store 27 is simply . r£gion o£ * 

— ory. This is how the translated code 21 see s and 
mteracts with the abstract registers bv . 
restoring between the „ 91sters < b V savrng and 

taroet ■ mem ° ry re9ion and various 

rget renters. Ia the source code Qf ^ 

«. however, the global register store 27 is . data array 
or an object which can be accessed and manipulated at a 

r l6Vel - « th — to the translated 
there srmplv ia no high-level representation. 

or .tlTlT"' SUbjSCt PrOC£SS ° r StSte " hlCh iS 
dLctl d6terminaWe in «» translator 19 i s encoded 

d-ectlv lnto the translateQ ^ ^ 
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calculated dynamically For - ev 

instruction type of che * t ai fl ^ S t PeCiali26d « «» 

5 target code for- «-u generate different 

a code for the same basic block if t-v^ ■ 
type of the last f laa-af fe^ ■ ■ instruction 

tlag affecting instruction changed. 

The translator 19 ■ 
corresponding to each Li ' ^ "™ t «« 

» Particular^ facilit a t translation, which 

iy tacil lta tes extended basic block u 
Sroup block, and cached translation state oo! 
hereafter described. Figure 3 ™ 

block data structure 30 ^ 3 . lllu » t »«« ««* a basic 
31 a f ' whlc h "eludes a subject address 

31. a target code pointer 33 (i e t.h. * 

l= the translated code, translat i,' ^ 3ddreSS ° f 

conditions 35 a ^ tL1 ^ » d «*= 

a profiling rn^tr-"' ^ ^ 7 - 
= *. r ^ 3 ' / references to tht* 

* • • • indexed by subject address t„ 
embodiment, the data corresponding to Da t T 

translated basic block m ay be stored L a C++ J 
translator creates a „ k object. The 

» block is translated ° ^ ^ " "» ^ 

The subject address- 31 of rna u - 

starting addr ess o f that basic block " 
or the subject p rogram 17 , .T* " ^ S — 
where the basio hi u waning the memory location 

basic block would be located if the snh-i . 
program 17 were „™ ( „ , subject 

is also , 9 ° n SUb3eCt architecture. This 

also referred to n, a , . 

While each basic M V Starti ng address. 

basrc block corresponds to a range of subject 
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addresses (one for each subject instruction,, the subject 
starting address is the subject address of the first 
instruction in the basic block. 

* The target address 33 of the basic block is the memory 

locatron (starting address, of the translated code 21 in 
the target program. The target address 33 is also 
referred to as the target code pointer, or the target 

10 tralT! ^ * the 

translator 19 treats the targ£t ^ ^ 

pointer which is dereferenced to invoke (transfer control 
to, the translated code. 

The basic block data structures 30, 41 42 43 
1= are stored in the basic block cache 23, which is"a 
repository of basic block objects organized' by subject 
address. when the translated cods of a basic block 
finishes executing, it returns control to the translator 

20 de t alS ° ValUS ° £ thS b ^ bl °<*'* 

destination (successor, subject address 31 to the 

translator. To determine if the successor basic block has 
already been translated, the translator is compares the 
destmation subject address 31 against the subject 
addresses 31 of basic blocks in the basic block cache 23 
U.e those that have already been translated,. Basic 
blocks which have not been yet translated are translated 
and then executed. Basic blocks which have already been 
translated (and which have compatible entry conditions, as 
discussed below, are simply executed. Over time, many of 
the basrc blocks encountered win already hava ^ 
translated, which causes the incremental translation cost 
to decrease. As such, the translator 19 gets faster over 
time, as fewer and fewer blocks require translation 
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Extended Basic Block £ 



one optimization applied accord . ng to illustrat 
s embodiment is to increase f ^ 

techm q ue referred to as "extended basic blocks .„ £ 

cases where a basic block A h« „„i 

A has only one successor block 

staticall I B) ' tranSlat ° r ^ b * - 

sta tl call y determine (when A is decoded, the subject 

ttT\ of B - In such cases - basic * - B j a : 

~ ™° 3 ««* (A-, which is referred to as 

an extended basic block Put *< ff , 

basic h,„ u differently, the extended 

basxc block mechanism can be aonH.H - 

-i„ m „ = ». applied to unconditional 

lumps whose destination is sta n™n 
, c . statically determinable- if a 

15 lump is conditional or if the rt- ,. • ■ 

.,, H „,, , the destination cannot be 

form 7 determined ' th - * ™te basic block must be 
formed^ An extended basic block may still formally be a 

*. of 0 ;;::: ' r:r de of bi ~ k a - - - — 

B exTendec b lessors -cludin g 

B extended basic blocks may be used to extend A into B 
for a particular execution in which B is the act 1 
successor and B-s address is statically determinable 

transit 1 " 115 ' ^""^^ Besses are those the 
translator r>^ -n ^ _ , . ■ Lc: 
can determine at decode-time n • 

30 construction of a block's IR £orest , ^ \ R 

constructed for the destination subject address whTh Is 
associated with the destination address abstract re g e 
it the value of destin^h-i^ . ^ 

xess 1R tre e as statically 
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determinable (i.e., does not depend on dynamic or run-time 
subject register values) , then the successor block is 
statically determinable. For example, in the case of an 
unconditional jump instruction, the destination address 
- (i.e., the subject starting address of the successor 
block) is implicit in the jump instruction itself- the 
subject address of the jump instruction plus the offset 
encoded in the jump instruction equals the destination 
address. Likewise, the optimizations of constant folding 
10 (e.g., X + (2 ♦ 3) .> X .+ 5) and expression folding (e g 
(X « 5) . io .> x • 50) may cause an otherwise "dynamic- 
destination address to become statically determinable 
The calculation of the destination address thus consists 
of extracting the constant value from the destination 
15 address IR. 

When extended basic block A' ±. created/ the 
translator subsequently treats it the same as any other 
basic block when performing IR generation, optimizations,' 
•0 and code generation. Because the code generation 

algorithms are operating on a larger scope (i.e., the code 
of basic blocks A and B combined), the translator 19 
generates more optimal code. 
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As one of ordinary skill i„ the art wm appreciatei 
decoding is the process of extracting individual subject 
instructions from the subject code. The subject code is 
stored as an unformatted byte stream (i.e., a collection 
of bytes in memory, . In the case of sufaject 

-th variable-length instructions (e.g., X86, , decoding 
frrst requires the identification of instruction 
boundaries; in the case of fixed . length instruction 
architectures, identifying instruction boundaries is 
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. trivia. ,.. g ., on the MIps _ ^ ^ 

:7; ti0 "' • ^ instruction fori i a then 

rem: r; bytes that constitut * * *~ — - 

S tie „ lnStrUCti - a--, the instruction 

type, operand re g ister nu^ers, Mediate field va l ues 
and an y other infection encoded in the instruction ' 
The process of decodin 9 ma chine instructions of a k no!n 
architecture fro m an unfor m atted byte strea m usin 9 Z 

10 r;r ture,s instructi ° n — - -» — l :: 

constituent basic blocks which is 

the transl , 9 baS1C bl ° Ck (A) is d -oded. if 

translator is detects ^ A' s successor ( B ) is 
statxcally determinable 51 it Ml , 

calculates B's starting 
address 53 anri t-K^ starting 
and then resumes the decoding process at th* 
starting address of B If B ' <s 
0 to be staf - „ successor, (c) is determined 

£e statically determinable 55 t-^ * 
proceeds ^ -v, decoding process 

Proceeds to the starting address of C , and so forth of 
course i -f -L^xun. or 

ir a successor block 

rr:r then nor.! transiatl H 1^ 

» ::ir t : ioc i r odin9 - the ^ - — ' 

- current ^ T ^ ^ ^l^T " " ^ 
subject address; the trand , u destination 
register for t/ „ tranSlat ° r haS * dedicated abstract 
egister for the destination address) Tn ^ 

extended basic block to ' ° f an 

compensate for the fact that 
intervening ju mps are being eliminated, as each 
constituent haqin ki i • ach new 

° WOCk 13 assi mi late d by the decoding 
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process, the IE tree for the calculation of that block's 
subject address Is pruned 54 (Figure- 4) . m ot her words 
when the translator 19 statically calculates B-s address 
and decoding resumes at b-s starting address, the IB tree 
5 corresponding to the dynamic calculation of B-s subject 
address 31 (which was constructed in the course of 
decoding A) is pruned; when decoding proceeds to the 
. starting address of c, the IB tree corresponding to c-s 
subject address is pruned 59, ■ and so forth. -Pruning- an 
10 IB tree means to remove any IR nodes which are depended on 
by the destination address abstract register and by no 
other abstract registers. Put differently, pruning breaks 
the link between the IR tree and the destination abstract 
register,- any other links to the same IR tree remain 
unaffected. m some cases, a pruned IR tree may also be 
depended on by another abstract register, in which case 
the IR tree remains to preserve the subject program's 
execution semantics. 
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To prevent code explosion (traditionally, the 
nutigating factor against such code specialization 
techniques,, the translator limits extended basic blocks 
to some maximum number of subject instructions. m one 
embodiment, extended basic blocks are limited to a maximum 
of 200 subject instructions. 



Isoblocks 



0 



Another optimization implemented in the illustrated 
embodiment is so-called "isoblocking. -- According to this 
technique, translations of basic blocks are parameterized 
or specialized, on a compatibility list, which is a set of 
variable conditions that describe the subject processor 
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state and the translator state. The compatibility list is 
different for each subject architecture, to take into 
account different architectural features. The actual 
values of the compatibility conditions at the entry and 
5 exit of a particular basic block translation are referred 
to as entry conditions and exit conditions, respectively. 

If execution reaches a basic block which has already 
been translated but the previous translation's entry 
10 conditions differ from the current working conditions 
(i.e., the exit conditions of the previous block), then 
the basic block must be translated again, this time based 
°n the current working conditions. The result is that the 
same subject code basic block is now represented by 
mUltiple tars,et c ° de translations. These different 
translations of the same basic block are referred to as 
isoblocks . 

■ . To support isoblocks, the data associated with each 

basic block translation includes one set of entry 
conditions 35 and one set of exit conditions 36 (Pi g ure 
3). In one embodiment, the basic block cache 23 is 
organized first by subject address 31 and then by entry 
conditions 35, 36 (Figure 3). In another embodiment, when 
the translator gueries the basic block cache 23 for a 
subject address 31, the guery may return multiple 
translated basic blocks (isoblocks) . 

Figure 5 illustrates the use of isoblocks. At the end 
of a first translated block's execution, the translated 
code 21 calculates and returns the subject address of the 
next block (i.e., the successor) 71. Control is then 
returned to the translator 19 , as demarcated by dashed 
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line 73. In the translator 19, the basic block cache 23 
is queried using the returned subject address 31, step 75 
The basic block cache may return zero, one, or more than 
one basic block data structures with the same subject 
address 31. if the basic block cache 23 returns zero data 
structures (meaning that this basic block has not yet been 
translated) , then the basic block must be translated, step 
77, by the translator 19. Each data structure returned by 
the basic block cache 23 corresponds to a different 
translation (isoblock) of the same basic block of subject 
code. as illustrated at decision diamond 79, if the 
current exit conditions (of the first translated block) do 
not match the entry conditions of any of the data 
structures returned by the basic block cache 23, then the 
basic block must be translated again, step 81, this time 
parameterized on those exit conditions. if the current 
exxt conditions match the entry conditions of one of the 
data structures returned by the basic block cache 23, then 
that translation is compatible and can be executed without 
re- translation, step 83. m the illustrative embodiment, 
the translator 19 executes the compatible translated block 
by dereferencing the target address as a function pointer 



As noted above, basic block translations are 
preferably parameterized on a compatibility list 
Exemplary compatibility lists will now be described for 
both the X86 and PowerPC architectures. 



X86 



An illustrative compatibility list for- the 
30 architecture includes representations of: (1) laz y 

propagation of subject registers; (2) overlapping abstract 
registers; (3) type of pending condition code flag- 
affecting instruction; (4) lazy propagation of condition 
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code flag-affecting instruction parameters; (5) direction 
of string copy operations; (6) floating point unit (FPU) 
mode of the subject processor; and (7) modifications of 
the segment registers. 

The compatibility list for the X86 architecture 
includes representations of any lazy propagation of 
subject registers by the translator, also referred to as 
register aliasing. Register aliasing occurs when the 
translator knows that two subject registers contain the 
same value at a basic block boundary. As long as the 
subject register values remain the same, only one of the 
corresponding abstract registers is synchronized, by 
saving it to the global register store. Until the saved 
subject register is overwritten, references to the non- 
saved register simply use or copy (via a move instruction) 
the saved register. This avoids two memory accesses (save 
+ restore) in the translated code. 



The compatibility list for the X86 architecture 
includes representations of which of the overlapping 
abstract registers are currently defined. In some cases, 
the subject architecture contains multiple overlapping 
subject registers which the translator represents using 
25 multiple overlapping abstract registers. For example, 
variable-width subject registers are represented using 
multiple overlapping abstract registers, one for each 
access size. For example, the X8 6 "EAX" register can be 
accessed using any of the following subject registers, 
each of which has a corresponding abstract register: EAX 
(bits 31...0), AX (bits 15...0) , AH (bits 15...8), and AL (bits 
7...0) . 
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The compatibility . list for the X86 architecture 
includes representations of, for each integer and floating 
point condition code flag, whether the flag value is 
normalized or pending, and if pending the type of the 
5 pending flag-affecting instruction. 

. The compatibility list for the X86 architecture 
includes representations of register aliasing for 
condition code- flag-affecting instruction parameters (if 

10 some subject register still holds the value of a flag- 
affecting instruction parameter, or if the value of the 
second parameter is the same as the first). The. 
compatibility list also includes representations of 
whether the second parameter is a small constant (i.e., an 

15 immediate instruction candidate), and if so its value. 

The compatibility list for the X86 architecture includes a 
representation of the current direction of string copy 
operations in the subject program. This condition field 
indicates whether string copy operations move upward or 

20 downward in memory. This supports code specialization of 
"strcpyO" function calls, by parameterizing translations 
on the function's direction argument. 

The compatibility list for the X86 architecture 
25 includes a representation of the FPU mode of the subject 
processor. The FPU mode indicates whether subject 

floating-point instructions are operating in 32- or 64 -bit 
mode . 

30 The compatibility list for the X86 architecture 

includes a representation of modifications of the segment 
registers. All X86 instruction memory references are 
based on one of six memory segment registers: CS (code 
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segment), DS (data segment), SS (stack segment), ES (extra 
data segment), FS (general purpose segment), and GS 
(general purpose segment) . Under normal circumstances an 
application will not modify the segment registers. As 
5 such, code generation is by default specialized on the 
assumption that the segment register values remain 
constant. it is possible, however, for a program to 
modxfy its segment registers< ±n wh . ch 

corresponding segment register compatibility bit will be 
10 set, causing the translator to generate code' for 
generalized memory accesses using the appropriate segment 
register's dynamic value. 
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An illustrative embodiment of a compatibility , list for 
the PowerPC architecture includes representations of- (i) 
mangled registers; ( 2 ) link value propagation; (3) type of 
pending condition code flag-affecting instruction; (4) 
lazy propagation of condition code flag-affecting 
xnstruction parameters; ( 5 ) condition code flag value 
aliasing; and (6) summary overflow flag synchronization 
state . 
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The compatibility list for the PowerPC architecture 
includes a representation o£ mangled registers. In cases 
where the subject code contains multiple consecutive 
memory accesses using a subject register for the base 
address, the translator may translate those memory 
accesses using a mangled target register. In cases where 
subject program data is not located at the same address in 
target memory as it would have been in subject memory, the 
translator must include a target offset in every memory 
address calculated by the subject code. while the subject 
register contains the subject base address, a mangled 
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target register contains the target address corresponding 
to that subject base address (i.e., subject base address + 
target offset) . • with register mangling, memory accesses 
can be translated more efficiently by applying the subject 
code offsets directly to the target base address, stored 
in the mangled register. By comparison, without the 
mangled register mechanism this scenario would require 
additional manipulation of the target code for each memory 
access, at the cost of both space and execution time. The 
compatibility list indicates which abstract registers if 
any are mangled. 



The compatibility list for the PowerPC architecture 
includes a representation of link value propagation. For 
15 leaf' functions (i.e., functions that call no other 
functions), the function body may be extended (as with the 
extended basic block mechanism discussed above) ' into the 
call/return site. Hence, the function body and the code 
that follows the function's return are translated 
20 together. This is also referred to as function return 
specialization, because such a translation includes code 
from, and is therefore specialized on, the function's 
return site. Whether a particular block translation used 
link value propagation is reflected in the exit 
25 conditions. As such, when the translator encounters a 
block whose translation used link value propagation, it 
must evaluate whether the current return site will be the 
same as the previous return site. Functions return to the 
same location from which they are called, so the call site 
30 and return site are effectively the same (offset by one or 
two instructions) . The translator can therefore determine 
whether the return sites are the same by comparing the 
respective call sites; this is equivalent to comparing the 
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subject addresses of the respective predecessor blocks (of 
the function block's prior and current executions). As 
-eh, in embodiments that support link value propagation, 
the data associated with each basic block translation 
5 includes a reference to the predecessor block translation 
(or some other representation of the predecessor block's 
subject address) . 

The compatibility list for the PowerPC architecture 
10 includes representations of, for each integer and floating 
point condition code flag, whether the flag value is 
normalized or pending, an d if pending the type of the 
pending flag-affecting instruction. 

" The compatibility list £or ths Powerpc architecture 

Eludes representations of register aliasing for flag- 
affecting instruction parameters ,if flag-affecting 
instruction parameter values happen to be live in a 
subject register, or if the value of the second parameter 

•0 xs the same as the first, . The compatibility list also 
mcludes representations of whether the second parameter 

is a small constant (i e an -j™™^- <- 

' an immediate instruction 
candidate), and if so its value. 

5 The com Patibilitv list fnr r, 

y ust tor the PowerPC architecture 

include, representations of register aliasing for the 
PowerPC condition .code flag values . . The Wpc 
architecture includes instructions for explicitly loading 

the entire set of PowerPC fl a „= i„ t „ 

wer "- Ila 3 s into a general purpose 
(sublet, register. This explicit representation of the 
subject flag values in subject registers interferes with 
the translator- s condition code flag" emulation 
opt.mr.ations. The compatibility list contains a 
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representation of whether the flag values are live in a 
subject register, and if so which register. During IR 
generation, references to such a subject register while it 
holds the flag values are translated into references to 
the corresponding abstract registers. This mechanism 
eliminates the need to explicitly calculate and store the 
subject flag values in a target register, which in turn 
allows the translator to apply the standard condition code 
flag optimizations. 



10 



The compatibility list for the PowerPC architecture 
includes a representation of summary overflow 
synchronization. This field indicates which of the eight 
summary overflow condition bits are current with the 
15 global summary overflow bit. when one of the PowerPC's 
eight condition fields is updated, if the global summary 
overflow is set, it is copied to the corresponding summary 
overflow bit in the particular condition code field. 

2 0 Translation Hints 

Another optimization implemented in the illustrative 
embodiment employs the translation hints 34 of the basic 
block data structure of Figure 3. This optimization 

25 proceeds from a recognition that there is static basic 
block data which is specific to a particular basic block 
but which is the same for every translation of that block 
For some types of static data which are expensive to 
calculate, it is more efficient for the translator to 

«0 calculate the data once, during the first translation of 
the corresponding block, and then store the result for 
future translations of the same block. Because this data 
13 Same f ° r ever y translation of the same block, it 



28 



does not parameterize translation and therefore it is not 
formally part of the Mock's compatibility list (discussed 
above) . Expensive static data is still stored in the data 
associated with each basic block translation, however, as 
S it ls cheaper to save the data than it is to recalculate 
In later translations of the same block, even if the 
translator 1 9 cannot reuse a prior - translation, the 
translator 19 can take advantage of these -translation 
hints" (i.e., the cached static data, to reduce the 
translation cost of the second and later translations. 

in one embodiment, the data associated with each basic 
block translation includes translation hints, which are 
calculated once during the first translation of that block 

15 and then copied (or refprr^ t- \ 

v (or referred to) on each subsequent 

translation. 

For example, in a translator 19 implemented in C + + 
translation hints may be implemented as a c ++ object in 
20 which case the basic block objects which correspond to 
different translations of the same block would each store 
a reference to the same translation hints object 
Alternatively, in a translator implemented in c ++ the 
basic block cache 23 may contain one basic block object 
per subject basic block (rather than per translation, 
»xth each such object containing or holding a reference to 
the corresponding translation hints; such basic block 
objects also contain multiple references to translation 
objects that correspond to different translations of that 
,0 block, organized by entry conditions. 



Exemplary translation hints for the voe 

b ror cne X86 architecture 

include representations of- (i) , 

11 ' mitral instruction 
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prefixes; and (2) initial repeat prefixes. Such 
translation hints for the X86 architecture particularly 
delude a representation of how many prefixes the first 
instruction in the block has. Some X86 instructions have 
prefixes which modify the operation of the instruction 
This architectural feature makes it difficult (i e 
expensive) to decode an X86 instruction stream. Once the 
number of initial prefixes is determined during the first 
decoding of the block, that value is then stored by the 
translator 19 as a translation hint, so that subsequent 
translations of the same bock do not need to determine it 



anew. 
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The translation hints for the X86 architecture further 
include a representation of whether the first instruction 
xn the block has a repeat prefix. Some X86 instructions 

such as string operations have a repeat prefix which tells 

the processor to execute that instruction multiple times. 

The translation hints indicate whether such a prefix is 

present, and if so its value. • 

in one embodiment, the translation hints associated 
wxth each basic block additionally include the entire IR 
forest corresponding to that basic block. This 
effectively caches all of the decoding and IR generation 
performed, by the frontend. i„ another embodiment, the 
.translation hints include the IR forest as it exists prior 
to being optimized. In another embodiment, the IR forest 
" CaChSd 33 3 translation hint, in order to conserve 

the memory resources of the translated program. 



30 



Group Blocks 



20 



25 



30 



Another optimization implemented in the illustrative 
translator embodiment is directed to eliminating program 
5 overhead resulting • from the necessity to synchronize all 
abstract registers at the end of execution of each 
translated basic block. This optimization is referred to 
as group block optimization. 

' As discussed above, in basic block mode (e.g., Figure 

2) , state is passed from one basic block to the next using 
a memory region which is accessible to all translated code 
sequences, namely, a global register store 27. The global 
register store 27 is a repository for abstract registers 
each of which corresponds to and emulates the value of a 
particular subject register or other subject architectural 
feature. During the execution- of translated code 21 
abstract registers are held in target registers so that 
they may participate in instructions. During the 

execution of translator code 21, abstract register values 
are stored in the global register store 27 or target 
registers 15. 

Thus, in basic block mode such as illustrated in 
Figure 2, all abstract registers must be synchronized at 
the end of each basic block for two reasons: (!) control 
returns to the translator code 19 , which potentially 
overwrites all target registers,- and (2, because code 

translator 19 must assume that ^ ^ 
values are live (i.e., win b e used in subsequent basic 
blocks, and therefore must be saved. The goal of the 
group block optimization mechanism is to reduoe 
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synchronization across basic block boundaries that are 
crossed frequently, by translating multiple basic blocks 
as a contiguous whole. By translating multiple basic 
blocks together, the synchronization at block boundaries 
5 can be minimized if not eliminated. 

Group block construction is triggered . when the current 
block's profiling metric reaches a trigger threshold. 
This block is referred to as the trigger block. . 
Construction can be separated into the following steps 
(Figure 6): (1) selecting member blocks 71; (2) ordering 
member blocks 73; (3) global dead code elimination 75; (4) 
global register allocation 77; and (5) code generation 79. 
The first step 71 identifies the set of blocks that are to 
be included in the group block by performing a depth-first 
search (DPS) traversal of the program's control flow 
graph, beginning with the trigger block and tempered by an 
inclusion threshold and a maximum member limit. The 
second step 73 orders the set of blocks and identifies the 
critical path through the group block, to enable efficient 
code layout that minimizes synchronization code and 
reduces branches. The third and fourth steps 75, 77 
perform optimizations. The final step 79 generates target 
code for all member blocks in turn, producing efficient 
25 code layout with efficient register allocation. 

In construction of a group block and generation of 
target code therefrom, the translator code 19 implements 
the steps illustrated in Figure 6. When the translator 19 
encounters a basic block that was previously translated, 
prior to executing that block, the translator 19 checks 
the block's profiling metric 3 7 (Figure 3) against the 
trigger threshold. The translator 19 begins group block 
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creation when a basic block's profiling metric 37 exceeds 
the trigger threshold. The translator 19 identifies the 
members of the group block by a traversal of the control 
flow graph, starting with the trigger block and tempered 
5 by the inclusion threshold and maximum member limit 
Next, the translator 19 creates an ordering of the member 
blocks, which identifies the critical path through the 
group block. The translator 19 then performs global dead 
code elimination; the translator 19 gathers "register 
Hveness information for each member block, using the IR 
corresponding to each block. Next, the translator 19 
performs global register allocation according to an 
architecture-specific policy, which defines a partial set 
of uniform register mappings for all member blocks 
Finally, the translator 19 generates target code for each 
member block in order, consistent with the global register 
allocation constraints and using the register lioness 
analyses. 

As noted above, the data associated with each basic 
block includes a profiling metric 37. m one embodiment, 
the profiling metric 37 is execution count, meaning that 
the translator 19 counts the number , of times a particular 
baS1C bl ° Ck haS b6en executed; in this embodiment, the 
profiling metric 37 is represented as an integer count ' 
field (counter). In another embodiment, the profiling 
metrrc 37 is .execution time, meaning that the translator 
19 keeps a running aggregate of the execution time for all 
executions of a particular basic block, such as by 
Planting code in the beginning and end of a basic block to 
start and stop, respectively, a hardware or software 
timer; in this embodiment, the profiling metric 37 uses 
some representation of the aggregate execution time 
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(timer). In another embodiment , the translator 19 stores 
multiple types of profiling metrics 37 for each basic 
block. In another embodiment , the translator 19 stores 
multiple sets of profiling metrics 37 for each basic 
block, corresponding to each predecessor basic block 
and/or each successor basic block, such that distinct 
profiling data is maintained for different control paths. 
In each translator cycle (i.e., the execution of 
translator code 19 between executions of translated code 
21), the profiling metric 37 for the appropriate basic 
block is updated. 

In embodiments that support group blocks, the data 
associated with each basic block additionally includes 
references 38, 39 to the basic block objects of known 
predecessors and successors. These references in 

aggregate constitute a control-flow graph of all 
previously executed basic blocks. During group block 
formation, the translator 19 traverses this control -flow 
graph to determine which basic blocks to include in the 
group block under formation. 



Group block formation in the illustrative embodiment 
is based on three thresholds: a trigger threshold, an 
25 inclusion threshold, and a maximum member limit. The 
trigger threshold and the inclusion threshold refer to the 
profiling metric 37 for each basic block. in each 
translator cycle, the profiling metric 37 of the next 
basic block is compared to the trigger threshold. If the 
metric 37 meets the trigger threshold then group block 
formation begins. The inclusion threshold is then used to 
determine the scope of the group block, by identifying 
which successor basic blocks to include in the group 
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block. The maximum member limit defines the upper limit 
on the number of basic blocks to be included in any one 
group block. 

When the trigger threshold is reached for basic block 
A, a new group block is formed with A as the trigger 
block. The translator 19 then begins the definition 
traversal, a traversal of A's successors in the control- 
flow graph to identify other member blocks to include. 
When traversal reaches a given basic block, its profiling 
metric 37 is compared to the inclusion threshold. If the 
metric 37 meets the inclusion threshold, that basic block 
is marked for inclusion and the traversal continues to the 
block's successors. If the block's metric 37 is below the 
inclusion threshold, that block is excluded and its 
successors are not traversed. When traversal ends (i.e., 
all paths either reach an excluded block or cycle back to 
an included block, or the maximum member limit is 
reached), the translator 19 constructs a new group block 
20 based on all of the included basic blocks. 

In embodiments that use isoblocks and group blocks, 
the control flow graph is a graph of isoblocks,' meaning 
that different isoblocks of the same subject block are 
25 treated as different blocks for the purposes of group 
block creation. Thus, the profiling metrics for different 
isoblocks of the same subject block are not aggregated. 

In another embodiment, isoblocks are not used in basic 
30 block translation but are used in group block translation, 
meaning that non-group basic block translations are 
generalized (not specialized on entry conditions). m 
this embodiment, a basic block's profiling metric is 
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disaggregated by the entry conditions of each execution, 
such that distinct profiling information is maintained. for 
each theoretical isoblock (i.e., for each distinct set of 
entry conditions) . In this embodiment, the data 

5 associated with each basic block includes a profiling 
list, each member of which is a .three-item set containing: 
(1) a set of entry conditions, (2) a corresponding 
profiling metric, and (3) a list of corresponding 
successor blocks. This data maintains . profiling and 
10 control path information for each set of entry conditions 
to the basic block, even though the actual basic block 
translation is not specialized on those entry condition. 
In this embodiment, the trigger threshold is compared to 
each profiling metric within a basic block's profiling 
15 metric list. When the control flow graph is traversed, 
each element in a given basic block's profiling list is 
treated as a separate node in the control flow graph. The 
inclusion threshold is therefore compared against each 
profiling metric in the block's profiling list. In this 
20 embodiment, group blocks are created for particular hot 
isoblocks (specialized to particular entry conditions) of 
hot subject blocks, but other isoblocks of those same 
. subject blocks are executed using the general (non- 
isoblock) translations of those blocks. 
25 . 

After the definition traversal, the translator 19 
performs an ordering traversal, step 73; Figure 6, to 
determine the order in which member blocks will be 
translated. The order of the member blocks affects both 
30 the instruction cache behavior of the translated code 21 
(hot paths should be contiguous) and the synchronization 
necessary on member block boundaries (synchronization 
should be minimized along hot paths) . In one embodiment, 
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the .translator 19 performs the ordering traversal using an 
ordered depth-first search (DFS) algorithm, ordered by 
execution count. Traversal starts at the member block 
having the highest execution count. If a traversed member 
block has multiple successors, the successor with the 
higher execution count is traversed first. 

One of. ordinary skill in the art will appreciate that 
group blocks are not formal basic blocks, as they may have 
internal control branches, multiple entry points, and/or 
multiple exit points. 

Once a group block has been formed, a further 
optimization may be. applied to -it, referred to herein as 
"global dead code elimination." . Such~ global dead code 
elimination employs the technique of liveness analysis. 
Global dead code elimination is the process of removing 
redundant work from the IR across a group of basic blocks. 

Generally, subject processor state must be 
synchronized on translation scope boundaries. A value, 
such as a subject register, is said to be "live" for the 
range of code starting with its definition and ending with 
its last use prior to being re-defined (overwritten) ; 
hence, the analysis of values' (e.g., temporary values in 
the context of IR generation, target registers in the 
context of code generation, or subject* registers in " the 
context of translation) uses and definitions is known in 
the art as liveness analysis. Whatever knowledge (i.e., 
liveness analysis) the translator has regarding the uses 
(reads) and definitions (writes) of data and state is 
limited to its translation scope; the rest of the program 
is an unknown. More specifically, because the translator 
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does not know which subject registers will be used outside 
the scope of translation (e.g., in a successor basic 
block) , it must assume that all .registers will be used. 
As such, the values (definitions) of any subject registers 
which were modified within a given basic block must be 
saved (stored to the global register store 27) at the end 
of that basic block, against ' the possibility of their 
future use. Likewise, all subject registers whose values 
will be used in a given basic block must be restored 
(loaded from the global register store 27) at the 
beginning of that basic block; i.e., the translated code 
for a basic block must restore a given subject register 
prior to its first use within that basic block. 

15 The general mechanism of IR generation involves an 

implicit form of "local" dead code elimination, whose 
scope is localized to only a small group of IR nodes at 
once. For example, a common subexpression A in the 
subject code would be represented by a single IR tree for 
A with multiple parent nodes, rather than multiple 
instances of the expression tree A itself. The 
"elimination" is implicit in the fact that one IR node can 
have links to multiple parent nodes. Likewise, the use of 
abstract registers as IR placeholders is an implicit form 
25 of dead code elimination. If the subject code for a given 
basic block never defines a particular subject register, 
then at the end of IR generation for that . block, the 
abstract register corresponding to that subject register 
will refer to an empty IR tree. The code generation phase 
30 recognizes that, in this scenario, the appropriate 
abstract register need not be synchronized with the global 
register store. As such, local dead code elimination is 
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implicit in the IR generation phase, occurring 
incrementally as IR nodes are created. 

In contrast to local dead code elimination, a "global" 
dead code elimination algorithm is applied to a basic 
block's entire IR expression forest. Global dead code 
elimination according to the illustrative embodiment 
requires liveness analysis, meaning analysis of subject 
register uses (reads) and subject register definitions 
(writes) within the scope of each basic block in a group 
block, to identify live and dead regions. The IR is 

transformed to remove dead regions and thereby reduce the 
amount of work that must be performed by the target code. 
For example, at a given point in the subject code, if the 
translator 19 recognizes or detects that a particular 
subject register will be defined (overwritten) before its 
next use, the subject register is said to be dead at all 
points in the code up to that preempting definition. In 
terms of the IR, subject registers which are defined but 
never used before being re-defined are dead code which can 
be eliminated in the IR phase without ever spawning target 
code. in terms of target code generation, target 

registers which are dead can be used for other temporary 
or subject register values without spilling. 

In group block global dead code elimination, liveness 
analysis is performed on all member blocks. Liveness 
analysis generates the IR forest for each member block, 
which is then used to derive the subject register liveness 
information for that block. IR forests for each member 
block are also needed in the code generation phase of 
group block creation. Once the IR for each member block 
is generated in liveness analysis, it can either be saved 
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for subsequent use in code generation, or it can be 
deleted and re -generated . during code generation. 

Group block global dead code elimination can 
5 effectively "transform" the IR in two ways. First, the IR 
forest generated for each member block during liveness 
analysis can be modified, and then that entire IR forest, 
can be propagated to (i.e., saved and reused during) the 
code generation phase; ■ in this scenario, the IR 

10 transformations are propagated through the code generation 
phase by applying them directly to the IR forest and then 
saving the transformed IR forest. In this scenario, the 
data associated with each member block includes liveness 
information (to be additionally used in global register 

15 allocation) , and the transformed IR forest for that block. 

Alternatively and preferably, the step of global dead 
code elimination which transforms the IR for a member 
block is performed during the final code generation phase 

20 of group block creation, using liveness information 
created earlier. In this embodiment, the global dead 
code transformations can be recorded as list of "dead" 
subject registers, which is then encoded in the liveness 
information associated with each member block. The actual 

25 transformation of the IR forest is thus performed by the 
subsequent code generation phase, which uses the dead 
register list to prune the IR forest. This scenario 
allows the translator to generate the IR once during 
liveness analysis, then throw the IR away, and then re- 

30 generate the same IR during the code generation, at which 
point the IR is transformed using the liveness analysis 
(i.e., global dead code elimination is applied to the IR 
itself) . In this scenario, the data associated with each 
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member block includes liveness information, which includes 
a list of dead subject registers. The IR forest is not 
saved. Specifically, after the IR forest is (re) generated 
in the code generation phase, the IR trees for dead 
5 subject registers (which are listed in the dead subject 
register list within the liveness information) are pruned. 

In one embodiment, the IR created during liveness 
analysis is thrown away after the liveness information is 

10 extracted, to conserve memory resources. The IR forests 
(one per member block) are recreated during code 
generation, one member block at a time. In this 

embodiment, the IR forests for all member blocks do not 
coexist at any point in translation. However, the two 

15 versions -of the IR forests, created during liveness 
analysis and code generation, respectively, are identical, 
as they are generated from the subject code using the same 
IR generation process. 

20 In another embodiment, the translator creates an IR 

forest for each member block during liveness analysis, and 
then saves the IR forest, in the data associated with each 
member block, to be reused during code generation. In 
this embodiment, the IR forests for all member blocks 

25 coexist, from the end of liveness analysis (in the global 
dead code elimination step) to code generation. In one 
alternative of this embodiment, no transformations or 
optimizations are performed on the IR during the period 
from its initial creation (during liveness analysis) and 

30 its last use (code generation) . 

In another embodiment, the IR forests for all member 
blocks are saved between the steps of liveness analysis 
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and code generation, and inter-block optimizations are 
performed on the IR forests prior to code generation. In 
this embodiment, the translator takes advantage of the 
fact that all member block IR forests coexist at the same 
5 point in translation, and optimizations are performed 
across the IR forests of different member blocks which 
transform those IR forests. In this case, the IR forests 
used in code generation may not be identical to the IR 
forests used in liveness analysis (as in the two 

10 embodiments described above) , because the IR forests have 
been subsequently transformed by inter-block 

optimizations. In other words, the IR forests used in 
code generation may be different than the IR forests that 
would result from generating them anew one member block at 

15 a time. 

In group block global dead code elimination, the scope 
of dead code detection is increased by the fact that 
liveness analysis is applied to multiple blocks at the 

20 same time. Hence, if a subject register is defined in the 
first member block, and then redefined in the third member 
block (with no intervening uses or exit points) , the IR 
tree for the first definition can be eliminated from the 
first member block. By comparison, under basic block code 

25 generation, the translator 19 would be unable to detect 
that this subject register was dead. 

As noted above, one goal of group block optimization 
is to reduce or eliminate the need for register 
30 synchronization at basic block boundaries. Accordingly, a 
discussion of how register allocation and synchronization 
is achieved by the translator 19 during group blocking is 
now provided. 
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Register allocation is the process of associating an 
abstract (subject) register with ' a target register. 
Register allocation is a necessary component of code 
5 generation, as abstract register values must reside in 
target registers to participate in target instructions. 
The representation of these allocations (i.e., mappings) 
between target registers and abstract registers is 
referred to as a register map. During code generation, 

10 the translator 19 maintains a working register map, which 
reflects the current state of register allocation (i.e., 
the target -to-abstract register mappings actually in 
existence at a given point in the target code) . Reference 
will be had hereafter to an exit register map which is, 

15 abstractly, a snapshot of the working register map on exit 
from a member block. However, since the exit register map 
is not needed for synchronization, it. is not recorded so 
it is purely abstract. The entry register map 40 (Figure 
3) is a snapshot of the working register map on entry to a 

20 member block, which is necessary to record for 
synchronization purposes; 

Also, as discussed above, a group block contains 
multiple member blocks, .and code generation is performed 

25 separately for each member block. As such, each member 
block has' its own entry register map 4 0 and exit register 
map, which reflect the allocation of particular target 
registers to particular . subj ect registers at the beginning 
and end, respectively, of the translated code for that 

30 block. 

Code generation for a group member block is 
parameterized by its entry register map 40 (the working 
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register map on entry) , but code generation also modifies 
the working register map. The exit register map for a 
member block reflects the working register map at the end 
of that block,, as modified by the. code generation process. 
5 When the first member block is translated, the working 
register map is empty (subject to global register 
allocation, discussed below) . At the end' of translation 
for the first member block, the working register map 
contains the register mappings created by the code 
10 generation process. The working register map is then 
copied into the entry register maps 40 of all successor 
member blocks . 

At the end of code generation for a member block, some 
15 abstract registers may not require synchronization. 
Register maps allow the translator 19 to minimize 
synchronization on member block boundaries, by identifying 
which registers actually require synchronization. By 
comparison, in the (non-group) basic block scenario all 
20 abstract registers must be synchronized at the end of 
every basic block. 

At the end of a member block, three synchronization 
scenarios are possible based on the successor. First, if 

25 the successor is a member block which has not yet been 
translated, its entry register map 40 is defined to be the 
same as the working register map, with the consequence 
that no synchronization is necessary. Second, if the 
successor block is external to the group, then all 

30 abstract registers must be synchronized (i.e., a full 
synchronization) because control will return to the 
translator code 19 before the successor's execution. 
Third, if the successor block is a member block whose 
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register map has already been fixed, then synchronization 
code must be inserted to reconcile the working map with 
the successor's entry map. 

Some of the cost of register map synchronization is 
reduced by the group block ordering traversal, Which 
minimizes register synchronization or eliminates it 
entirely along hot paths. Member blocks are translated in 
the order generated by the ordering traversal. As each 
member block is translated, its exit register map is 
propagated into the entry register map 40 of all successor 
member blocks whose entry register maps are not yet fixed. 
In effect, the hottest path in the group block is 
translated first, and most if not all member block 
15 boundaries along that path require no synchronization 
because the corresponding register maps are all 
consistent . 
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For example, the boundary between the first and second 
member blocks will always require no synchronization, 
because the second member block will always have its entry 
register map 4 0 fixed to be the same as the exit register 
map 41 of the first member block. Some synchronization 
between member blocks may be unavoidable because group 
blocks can contain internal control branches and multiple 
entry points. This means that execution may reach the 
same member, block from different predecessors, with 
different working register maps at different times. These 
cases require that the translator 19 synchronize the 
working register map with the appropriate member block's 
entry register map. 
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If required, register map synchronization occurs on 
member block boundaries. The translator 19 inserts code 
at the end of a member block to synchronize the working 
register map with the successor's entry register map 40. 
5 ■ In register map synchronization, each abstract register 
falls under one of ten synchronization conditions. Table 
1 illustrates the ten register synchronization cases as a 
function of the translator's working register map and the 
successor's entry register map 40. Table 2 describes the 

10 register synchronization algorithm, by enumerating the ten 
formal synchronization cases with text descriptions of the 
cases and pseudo-code descriptions of the corresponding 
synchronization actions (the pseudo-code is explained 
below) . Thus, at every member block boundary, every 

15 abstract register is synchronized using the 10 -case 
algorithm. This detailed articulation of synchronization 
conditions and actions allows the translator 19 to 
generate efficient synchronization code, which minimizes 
the synchronization cost for each abstract register. 

20 

The following describes the synchronization action 
functions listed in Table 2. " Spill (E (a) ) " saves abstract 
register a from target register E(a) into the subject 
register bank (a component of the global register store) . 

25 "Fill (t, a.) " loads abstract register a from the subject 
register bank into target register t. "Reallocate () " 
moves and reallocates (i.e., changes the mapping of) an 
abstract register to a new target register if available, 
or spills the abstract register if a target register is 

30 not available. "FreeNoSpill (t) " marks a target register 
as free without spilling the associated abstract subject 
register^ The FreeNoSpill ( ) . function is necessary to 
avoid superfluous spilling across multiple applications of 
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the algorithm at the same synchronization point. Note 
that for cases with a "Nil" synchronization action, no 
synchronization code is necessary for the corresponding 
abstract registers . 
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Table 1: Enumeration of the 
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Table 2: Register Map Synchronization Scenarios 




Case 


Description 


Action 


1 


a g (dom E u> dom W) 


W(...) 
E(...) 

The abstract register is neither in the working rmap or the 
entry rmap. 


Nil 


2 


a e dom W 

A 

a g dom E 

A 

W(a) g mg E 


W(a=>tl,...) 
E(...) 

The abstract register is in the working rmap, but not in the 
entry rmap. Furthermore the target register used in the 
working rmap is not in the range of the entry rmap. 


Spill(W(a)) 


3 


a € dom W 

A 

a g dom E 

A 

W(a) e rng E 


W(al=>tl,...) 
E(ax=>tl,...) 

The abstract register is in the working, but not in the entry 
rmap. However the target register used in the working rmap is 
in the range of the entry rmap. 


Spill(W(a)) 


4 


a g dom W 

A 

a g dom E 

A 

E(a) e rng W 


W(...) 

E(al=>tl,...) 

The abstract register is in the entry rmap but not in the 
working rmap. Furthermore the target register used in the 
entry rmap is not in the range of the working rmap. 


Fill(E(a), a) 


5 


a £ dom W 

A 

a e dom E 

A 

E(a) e mg W 


W(ax=>tl,...) 
E(al=>tl,...) 

The abstract register is in the entry rmap but not in the 
working rmap. However the target register used in the entry 
rmap is in the range of the working rmap. 


Reallocate(E(a)) 
Fill(E(a), a) 


6 


a e (dom W n dom E) 

A 

W(a) fi£ rng E 

A 

E(a) g rng W 


W(al=>tl,...) 
E(al=>t2,...) 

The abstract register is in the working rmap and the entry 
rmap. However both use different target registers. 
Furthermore the target register used in the working rmap is 
not in the range of the entry rmap and the target register used 
in the entry rmap is not in the range of the working rmap. 


Copy W(a) => 
E(a) 

FreeNoSpill(W(a)) 


7 


a e (dom W n dom E) 

A 

W(a) g rng E 

A 

E(a) e rng W 


W(al=>tl,ax=>t2...) 
E(al=>t2,...) 

The abstract register in the working rmap is in the entry rmap. 
However both use different target registers. The target register 
used in the working rmap is not in the range of the entry 
rmap, however the target register used in the entry rmap is in 
the range of the working rmap. 


Spill(E(a)) 

Copy W(a) => 

E(a) 

FreeNoSpill(W(a)) 
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Table 2: Register Map Synchronization Scenarios 




Case 


Description 


Action 


8 


a e (dom W n dom E) 

A 

W(a) e rng E 

A 

E(a) e rng W 


W(al=>tl,...) 
E(al=>t2,ax=>tl,...) 

The abstract register in the working rmap is in the entry rmap. 
However both use different target registers. The target register 
used in the entry rmap is not in the range of the working 
rmap, however the target register used in the working rmap is 
in the range of the entry rmap. 


Copy W(a) => 
E(a) 

FreeNoSpill(W(a)) 


9 


a e (dom W n dom E) 

A 

W(a) e rng E 

A 

E(a) e mg W 

A 

W(a) * E(a) 


W(al=>tl,ax=>t2,...) 
E(al=>t2,ay=>tl,...) 

The abstract register in the working rmap is in the entry rmap. 
Both use different target registers. However, the target 
register used in the entry rmap is in the range of the working 
rmap, and the target register used in the working rmap is in 
the range of the entry rmap. 


Spill(E(a)) 

Copy W(a) => 

E(a) 

FreeNoSpiIl(W(a)) 


10 


a e (dom W n dom E) 

A 

W(a) e rng E 

A 

E(a) e mg W 

A 

W(a) = E(a) 


W(al=>tl,...) 
E(ai=>tl,,..) 

The abstract register in the working rmap is in the entry rmap. 
Furthermore they both map to the same target register. 


Nil 



The translator 19 performs two levels of register 
allocation within a group block, global and local (or 
5 temporary) . Global register allocation is the definition 
of particular register mappings, before code generation, 
which persist across an entire group block (i.e., 
throughout all member blocks) . Local register allocation 
consists of the register mappings created in the process 
10 of code generation. Global register allocation defines 
particular register allocation constraints which 
parameterize the code generation of member blocks, by 
constraining local register allocation. 
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Abstract registers that are globally allocated do not 
require synchronization on member block boundaries, 
because they . are guaranteed to be allocated to the same 
respective target registers in every member block. This 
5 approach has . the advantage that synchronization code 
(which compensates for differences in register mappings 
between blocks) is never required for globally ' allocated 
abstract registers on member block boundaries. The 
disadvantage of group block register mapping is that it 
10 hinders local register allocation because the globally 
allocated target registers are not immediately available 
for new mappings. To compensate, the number of global 
register mappings may be limited for a particular group 
block. 

15 

The number and selection of actual global register 
allocations is defined by a global register allocation 
policy. The global register allocation policy is 

configurable based on subject architecture, target 

20 architecture, and applications translated. The optimal 
number of globally allocated registers is derived 
empirically, and is a function of the number of target 
registers, the number of subject registers, the type of 
application being translated, and application usage 

25 patterns. The number is generally a fraction of the total 
number of target registers minus some small number to 
ensure that enough target registers remain for temporary 
values. 

3 0 In cases where there are many subject registers but 

few target registers, such as the MIPS-X86 and PowerPC-X86 
translators, the number of globally allocated registers is 
zero. This is because the X86 architecture has so few 
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target registers that using any fixed register allocation 
has been observed to produce worse target code than none 
at all. 

5 In cases where there- are many subject registers and 

many target registers, such as the X86-MIPS translator, 
the number of globally allocated registers (n) is three 
quarters the number of target registers (T) . Hence: 

10 X86-MIPS: n = % * T 

Even though the X8 6 architecture has few general 
purpose registers, it is treated as 1 having, many subject 
registers because many abstract registers are necessary to 
15 emulate the complex X86 processor state (including, e.g., 
condition code flags) . 

In cases where the number of subject registers and 
target registers is approximately the same, such as the 
MIPS-MIPS accelerator, most target registers are globally 
allocated with only a few reserved for temporary values. 
Hence: 
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MIPS-MIPS : n = T - 3 



In cases where the total number of subject registers 
in use across the entire group block (s) is less than or 
equal to the number of target registers (T), all subject 
registers are globally mapped. This means that the entire 
30 register map is constant across all member blocks. In the 
special case where (s = r) , meaning that the number of 
target registers and active subject registers is equal, 
this means that there are no target registers left for 
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temporary calculations; in this case, temporary values are 
locally allocated to target registers that are globally 
allocated to subject registers that have no further uses 
within the same expression tree (such information is 
5 obtained through liveness analysis) . 

At the end of group block creation, code generation is 
performed for each member block, in the traversal order. 
During code generation, each member block's IR forest is 
(re)generated and the list of dead subject registers 
(contained in that block's liveness information) is used 
to the prune the IR forest prior to generating target 
code. As each member block is translated, its exit 
register map is propagated to the entry register maps 40 
of all successor member blocks (except those which have 
already been fixed). Because blocks " are translated in 
traversal order, this has the effect of minimizing 
register map synchronization along hot paths, as well as 
making hot path translations contiguous in the target 
20 memory space. As with basic block translations, group 
member block translations are specialized on a set of 
entry conditions, namely the current working conditions 
when the group block was created. 
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Figure 7 provides an example of group block generation 
by the translator code 19 according to an illustrative 
embodiment. The example group block has five members ("A" 
to "E"), and initially one entry point ("Entry l'- ; Entry 2 
is generated later through aggregation, as discussed 
below) and three exit points ( "Exit 1," "Exit 2," and "Exit 
3"). In this example, the trigger threshold for group 
block creation is an execution count of 45000, and the 
inclusion threshold for member blocks is an execution' 
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count of 1000. The construction of this group block was 
triggered when block A's execution count (now 45074) 
reached the trigger threshold of 45000, at which point a 
search of the control flow graph was performed in order to 
identify the group block members. In this example, five 
blocks were found that exceeded the inclusion threshold of 
1000. Once the member blocks are identified, an ordered 
depth first search (ordered by profiling metric) is 
performed such that hotter blocks and their successors are 
processed first; this produces a set of blocks with a 
critical path ordering. 



At this stage global dead code elimination is 
performed. Each member block is analyzed for register 
15 uses and definitions (i.e., Hveness analysis). This 
makes code generation more efficient in two ways. First, 
local register allocation can take into account which 
- subject registers are live in the group. block (i.e., which 
subject registers will be used in the current or successor 
member blocks) , which^ helps to minimize the cost of 
spills; dead registers are spilled first , because they do 
.not. need, to- be restored. In addition, if liveness 
analysis' shows that a particular, subject register is 
defined, used, and then redefined (overwritten) , the value 
can be thrown away any time after the last use (i.e., its 
target register can be freed) . If liveness analysis shows 
that a particular subject register value is defined and 
then redefined without any intervening uses (unlikely, as 
this would mean that the subject compiler generated dead 
code) , then the corresponding IR tree for that value can 
be thrown away, such that no target code is ever generated 
for it. 
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Global register allocation is next. The translator 19 
assigns frequently accessed subject registers a fixed 
target register mapping which is constant across all 
member blocks. Globally allocated registers are non- 
spillable, meaning that those target registers are 
unavailable to local register allocation. A percentage of 
target registers must be kept for temporary subject 
register mappings when there are more subject registers 
than target registers. In special cases where the entire 
set of subject registers within the group block can fit 
into target registers, spills and fills are completely 
avoided. As illustrated in Figure 7, the translator 
plants code ("Prl") to load these registers from the 
global register store 27 prior to entering the head of . the 
group block ( "A" ) ; such code is referred to as prologue 
loads. 

The group block is now ready for target code 
generation. During code generation, the translator 19 
uses a working register map (the mapping between abstract 
registers and target registers) to keep track of register 
allocation. The value of the working register map at the 
beginning of each member block is recorded in that block's 
associated entry register map 40. 

First the prologue block Prl is generated which loads 
the globally allocated abstract registers. At this point 
the working register map at the end of Prl is copied to 
the entry register map 4 0 of block A. 

Block A is then translated, planting . target code 
directly following the target code for Prl. Control flow 
code is planted to handle the exit condition for Exit 1, 
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which consists of a dummy branch (to be patched later) to 
epilogue block Epl (to be planted later). At the end of 
block A, the working register map is copied to the entry 
register map 40 of block B. This fixing of B's entry 
register map 40 has two . consequences: first, no 

synchronization is necessary on the path from A to B; 
second, entry to B from any other block (i.e., a member 
block of this group block or a member block of another 
group block using aggregation) requires synchronization of 
that block's exit register map with B's entry register 
map . 



Block B is next on the critical path. Its target code- 
is planted directly following block A, and code to handle 

15 the two successors, C and A, is then planted. The first 
successor, block C, has not yet had its entry register map 
40 fixed, so the working register map is simply copied 
into C's entry register map. The second successor, block 
A, however, has previously had its entry register map 40 

2 0 fixed and therefore the working register map at the end of 
block B and the entry register map 40 of block A may 
differ. Any difference in the register maps requires some 
synchronization ("B-A") along the path from block B. to 
block A in order to bring the working register map into 
25 line with the entry register map 40. This synchronization 
takes the form of register spills, fills, and swaps and is 
detailed in the ten register map synchronization scenarios 
above . 



Block C is now translated and target code is planted 
directly following block G. Blocks D and E are likewise 
translated and planted contiguously. The path from E to A 
again requires register map synchronization, from E's exit 



register map (i.e., the working register map at the end of 
E's translation) to A's entry register map 40, which is 
planted in block "E-A." 

Prior to exiting the group block and returning control 
to the translator 19, the globally allocated registers 
must be synchronized to the global register store; this 
code is referred to as epilogue saves. After the member 
blocks have been translated, code generation plants 
epilogue blocks for all exit points (Epl, Ep2 , and Ep3) , 
and fixes the branch targets throughout the member blocks. 

In embodiments that use both isoblocks and group 
blocks, the control flow graph traversal is made in terms 
of unique subject blocks (i.e., a particular basic block 
in the subject code) rather than isoblocks of that block. 
As such, isoblocks are transparent to group block 
creation. No special distinction is made with respect to 
subject blocks that have one translation or multiple 
translations. 

In the illustrative embodiment, both the group block 
and isoblock optimizations may be advantageously employed. 
However, the fact that the isoblock mechanism may create 
different basic block translations for the same subject 
code sequence complicates the process of deciding which 
blocks to include in the group block, since the blocks to 
be included may not exist until the group block is formed. 
The information collected using the unspecialized blocks 
that existed prior to the optimization must be adapted 
before being used in the selection and layout process.. 
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The illustrative embodiment further employs a 
technique for accommodating features of nested loops in 
group block generation. Group blocks are originally 
created with only one entry point, namely the start of the 
trigger block. Nested loops in a program cause the inner 
loop to become hot first, creating a group block 
representing the inner loop. Later, the outer loop 
becomes hot, creating a new group block that includes all 
the blocks of the inner loop as well as the outer loop. 
If the group block generation algorithm does not take 
account of the work done for the inner loop, but instead 
re-does all of that work, then programs that contain 
deeply nested loops will progressively generate larger and 
larger group blocks, requiring more storage and more work 
on each group block generation. In addition, the older 
(inner) group blocks may become unreachable and therefore 
provide little or no benefit. 

According to the illustrative embodiment, group block 
aggregation is used to enable a previously built group 
block to be combined with additional optimized blocks. 
During the phase, in which blocks are selected for 
inclusion in a new group block, those candidates which are 
already included in a previous group block are identified. 
.Rather than planting target code for these blocks, 
aggregation is performed, whereby the translator 19 
creates a link to the appropriate location in the existing 
group block. Because these links may jump to the middle 
of the existing group block, the working register map 
corresponding ; to that location must be' enforced; 
accordingly, the code planted for the link includes 
register map synchronization code as required. 
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The entry register map 40 stored ' in the basic block 
data structure. 3 0 supports group block aggregation. 
Aggregation allows other translated code to jump into the 
middle of a group block, using the beginning of the member 
5 block as an entry point. Such entry points require that 
the current working register map be synchronized to the 
member block's entry register map 40, which the translator 
19 implements by planting synchronization code (i.e., 
spills and fills) between the exit point of the 
10 predecessor and the entry point of the member block. 

In one embodiment, some member blocks 7 register maps 
are selectively deleted to conserve -resources . Initially, 
the entry register maps of all member blocks in a group 

15 are stored indefinitely, to facilitate entry into the 
group block (from an aggregate group block) at the 
beginning of any member block. As group blocks become 
large, some register maps may be deleted to conserve 
memory. If this happens, aggregation effectively divides 

20 the group block into regions, some of which (i.e., member 
blocks whose register maps have been deleted) are 
inaccessible to aggregate entry. Different policies are 
used to determine which register maps to store. One 
policy , is to store all register maps of all member blocks 

25 (i.e., never delete). An alternative policy is to store 
register maps only for the hottest .member blocks. An 
alternative policy is to store register maps only for 
member blocks that are the destinations of backward 
branches (i.e., the start of a loop).. 

30 

In another embodiment, the data associated with each 
group member block includes a recorded register map for. 
every subject instruction location. This allows other 
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translated code to jump into the middle of a group block 
at any point, not just the beginning of a member block, 
as, in some cases, a group member block may contain 
undetected entry points when the group block is formed. 
This technique consumes large amounts of memory, and is 
therefore only appropriate when memory conservation is not 
a concern. 

Group blocking provides a mechanism for identifying 
frequently executed blocks or sets of blocks and 
performing additional optimizations on them. Because more 
computationally expensive optimizations are applied to 
group blocks, their formation is preferably confined to 
basic blocks which are known to execute frequently. In 
the case of group blocks, the extra computation is 
justified by frequent execution; contiguous blocks which 
are executed frequently are referred to as a "hot path." 

Embodiments may be configured wherein multiple levels 
of frequency and optimization are used, such that the 
translator 19 detects multiple tiers of frequently 
executed basic blocks, and increasingly complex 
optimizations are applied. Alternately, and as described 
above only two levels of optimization are used: basic 
optimizations are applied to all basic blocks, and a 
single set of further optimizations' are applied to group 
blocks using the group block creation mechanism described 
above . 

Figure 8 illustrates the steps performed by the 
translator at run-time, between executions of translated 
code. When a first basic block (BB^) finishes execution 
12 01, it returns control to the translator 1202. The 
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translator increments the profiling metric of the first 
basic block 1203. The translator then queries the basic 
block cache 1205 for previously translated isoblocks of 
the current basic block (BB N/ which is BB N «i ' s successor), 
5 using the subject address returned by the first basic 
block's execution. If the successor block has already 
been translated, the basic block cache will return one or 
more basic block data structures. The translator then 
compares the successor's profiling metric to the group 

10 block trigger threshold 12 07 (this may involve aggregating 
the profiling metrics of multiple isoblocks) . If the 
threshold is not met, the translator then checks if any 
isoblocks returned by the basic block cache are compatible 
with the working conditions (i.e., isoblocks with entry 

15 conditions identical to the exit conditions of BB N -i) . If 
a compatible isoblock is found, that translation is 
executed 1211. 

If the successor profiling metric exceeds the group 
20 block trigger threshold, then a new group block is created 
1213 and executed 1211, as discussed above, even if a 
compatible isoblock exists. 

If the basic block does not return any isoblocks, or 
25 none of the isoblocks returned are compatible, then the 
current block is translated 1217 into an isoblock 
specialized on the current working conditions, as 
discussed above. At the end of decoding BB N , if the 
successor of BB N (BB N+1 ) is statically determinable 1219, 
30 then an extended basic is created 1215. If an extended 
basic .block is created, then BB N+1 is translated 1217, and 
so forth. When translation is complete, the new isoblock 
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is stored in the basic block cache 1221 and then executed 
1211. 

Partitions 

In a preferred embodiment, the translator 19 employs a 
partitioning technique to divide the subject code space 
into regions, referred to hereafter as partitions. Each 
partition includes a distinct set of translator data 
structures (e.g., basic blocks) and target code. A 
partition represents a range of subject code 17 addresses 
which does not overlap with any other partition, meaning 
that no translated block spans between two partitions. As 
such, all translations of a particular subject code 
15 address are present in only a single partition. 

The partitioning technique utilized by the translator 
19 divides the translator 19 's representation of control 
flow into non-overlapping regions of subject memory. In 
20 some cases, these partitions correspond to the different 
libraries and object files that make up a subject program, 
such that partitions are created and replaced as libraries 
are mapped into subject memory and later discarded. The 
partitioning technique is particularly utilized by the 
translator 19 when translating subject programs that 
modify their own subject code 17, hereinafter referred to 
as "self -modifying" . By dividing subject code 17 and its 
corresponding translated target code . 21 into partitions, 
one partition can be discarded when its associated code is 
modified without affecting the valuable information (e.g., 
translations) built up in other partitions. Without the 
use of such partitions, all existing translations, even 
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those not affected by the code modification, would need to 
be discarded in response to every code modification. 

In a preferred embodiment, the translator 19 maintains 
segregated data structures, such that translation data 
structures are organized by partitions. This segregation 
allows for the numerous block translations (and affiliated 
data structures) that are associated with a single 
partition to be discarded and freed at once, rather than 
traversing the underlying representations of translated 
code to search for affected translations and deleting them 
one at a time. For example, in one embodiment the 
translator 19 maintains a separate basic block cache 23 
for each partition. 

When subject code 17 modifies other subject code, the 
"modification event" is detected by the translator 19. A 
modification event is defined to correspond to a 
particular subject address range (the range of subject 
code that is overwritten or deleted by the modification) . 
The translator 19 must first detect when and where subject 
code is self-modified (i.e., which subject code performs 
the modification, and which subject code is modified) . 
Modification events are any events when subject code 17 is 
modified by subject code 17. Self -modifying code does not 
include cases in which subject code 17 modifies subject 
data. Self -modification of subject code 17 can take many 
forms, including but not limited to: (i) mapping a file 
into memory (e.g., mmap() system calls); (ii) removing a 
file from memory (e.g., munmapO system calls); and (iii) 
making a memory region executable (e.g., changing its 
permissions using the mprotectO system call). 
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Alternatively, in cases where subject code 
modifications are made using a well known system call, 
such as mmapO, munmapO or mprotect (.) , the translator 19 
detects all calls to that system call in the subject code. 
The translator 19 can also, detect subject code 17 
modifications by other mechanisms. For example, many 
subject processors require that, prior to executing code 
that has been modified, the subject program must first 
flush the processor's instruction cache ( I -cache) . In 
particular, the PowerPC architecture has a special 
instruction for this purpose, "ICBI" (Instruction Cache 
Block Invalidate) , while other architectures may use a 
special system call for the same purpose. On 
architectures with such a cache flush requirement, the 
translator 19 may detect modification events by detecting 
instances of the special instruction (or special system 
call) . 

Alternatively, the translator 19 can use features of 
the target operating system 2 0 to monitor all writes to 
the target memory regions which correspond to subject 
code, in order to detect modification events. On certain 
systems, the system call mprotect () allows the translator 
19 to set particular regions of memory "read-only". Other 
systems may utilize equivalent functions to that of the 
mprotect () system call to define a particular area of 
memory as read-only. Any attempt to write into read-only 
regions triggers a signal which is detected by the 
translator 19. The signal notifies the translator 19 that 
a subject code modification is taking place, and the 
translator 19 uses the signal context to determine which 
subject address is being overwritten. After detecting the 
modification and identifying the scope of the modification 
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(i.e., the subject addresses affected), the translator 19 
then allows the write to proceed normally. 

Alternatively, the translator 19 can generate a 
special target code sequence, for each translated memory 
write operation in the subject code, which checks if the 
write address corresponds to subject code rather than 
subject data, in order to detect modification events. 

After a modification event has been detected, the 
modification results in the creation of a new partition 
whose subject address range is the same as the modified 
range. In addition, if any existing partitions had 
address ranges that intersected (overlapped) the modified 
range, all such intersecting partitions are destroyed and 
re-created, except in certain situations where additional 
partition optimizations are applied as described below. 
For each intersecting partition that is destroyed a new 
remainder partition is created for the remaining range, 
meaning the original range of the intersecting partition 
minus the intersection. When a partition is destroyed, 
all existing target code 21 translations associated with 
that partition are discarded. As such, a new partition 
is, at first, completely empty of translated target code 
21, meaning that any subject code 17 subsequently 
encountered from that partition must be translated from 
scratch . 

Figure 9 illustrates the creation of a new partition, 
from a set of existing partitions and a modification event 
which overlaps some of those partitions. Prior to the 
modification, four partitions A, B, C and D exist. The 
subject code then modifies a range of subject code 
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addresses extending from point 100 in memory to point 102. 
The range of the modification 104 intersects with 
partitions A and C and totally encompasses partition B. 
Partition M is created for the entire modified range 104. 
Partitions A and C are destroyed, and partition A' and C 
are created, respectively, for the remaining portions of A 
and C which do not intersect the modification range 104. 
Partition B is destroyed and its range is completely 
subsumed by the new partition M. Partition D, whose range 
does not overlap the modification range 104, remains 
unaffected. 

One optimization technique that can be applied to 
partitioning in order to improve performance of the 
translator 19 is lazy partition allocation. In order to 
save memory, the full data structures of a partition are 
not allocated at the time a partition is created. 
Instead, an inactive partition is initially created which 
reserves the partition's subject address range, but the 
translator 19 does not initially allocate any of the 
underlying translation data structures and memory. In 
this manner, the initial inactive partitions are 
essentially empty skeleton partitions that merely reserve 
a particular subject address range. When any subject code 
17 within the partition is actually translated, the 
inactive partition becomes a live partition and the 
partition data structures are initialized. Accordingly, 
modification events which correspond to data segments of 
the subject program (e.g., files which are mmap()ed as 
data, and which the subject program never executes as 
code) cause a new inactive partition to be created, but 
none of the underlying translation data structures and 
memory regions are allocated. 
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In a preferred embodiment, the translator 19 allocates 
large regions of memory for new partitions to store target 
code 21 translations. Target code 21 is preferably 
5 allocated to be contiguous so as to avoid unnecessary 
fragmentation. The translator 19 is more likely to be 
able to make target code 21 contiguous if it has a large 
memory region to use. Partitions which are falsely 
detected (i.e., modification events in subject memory 

10 ranges which do not correspond to executable subject code) 
can sometimes consume valuable memory resources for no 
purpose. By only allocating translation data structures 
and memory when subject code 17 within a partition is 
actually translated (and executed) , lazy partition 

15 allocation allows the translator 19 to avoid the negative 
impact of false positives from the modification event 
detection mechanism associated with partitioning. 

In certain situations, the regions of the subject 
20 memory space which the translator 19 identifies as being 
modified by a modification event may never actually be 
translated into target code. This may be because the 
subject memory region corresponds to data, or because the 
subject memory region corresponds to code which is never 
25 executed. To avoid the overhead of wasted .memory 

resources, an inactive or skeleton partition is created 
initially, where the partition's full data structures are 
only realized when translation occurs within the 
partition's range of subject addresses. Because inactive 
30 partitions contain no translations of subject code, they 
can be created, deleted, or resized without the memory and 
performance overhead associated with live partitions. 
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In an alternative embodiment of the translator 19 
which also employs interpreter functionality, the 
translator 19 may invoke interpreter functions to 
interpret subject code 17 rather than translating it, 
5 specifically for the purpose of delaying translation and 
thereby avoiding the allocation of partitions associated 
with that subject code 17. A translation method and 
apparatus which facilitates such interpreter functionality 
is described in co-pending UK Patent Applications Serial 
10 No. 03 09056.0 and 03 15164.4, both entitled "Block 
Translation Optimizations for Program Code Conversion", 
the disclosure of which is incorporated herein by 
reference. 
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In another preferred embodiment of the translator 19, 
an optimization technique that can be applied to 
partitioning in order to improve performance of the 
translator 19 is reducible partitions, where the address 
range of a partition is reduced rather than deleting the 
20 partition, to a range which includes the range of 
addresses actually used in the partition. Initially in 
response to a modification event, a new partition is 
created with a starting address and an ending address 
whose values are defined by the scope of the modification. 
25 However, in operation, the actual range of subject 
addresses that are translated may be narrower than the 
defined range of the partition. The range encompassing 
all subject addresses within a partition that have 
actually been translated or which are flagged for future 
translation is referred to as the "active range" or used 
range of a partition. The range of subject addresses in a 
partition that have not been translated is referred to as 
the "inactive range." m order to optimize the 
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performance of the translator 19, the size of a partition 
may be reduced, from its initially defined range of 
addresses to a lesser range which includes its active 
range of addresses by eliminating at least a portion of 
5 the inactive range of addresses. 

While the concept of reducible partitions may be 
implemented in any number of ways, in one preferred 
embodiment, the translator 19 maintains the active range 

10 or translated range of addresses in each partition. When 
a partition is created, its active range of addresses is 
initially empty. During translation, as each subject 
instruction is translated, its subject address ("the 
translated address" ) is compared to the active range of 

15 the current partition. If the translated address is 
outside the current active range, the active range is 
expanded to include the translated address. As such, the 
active range will grow as translation of the subject code 
progresses . 

20 

When the translator 19 detects a modification event, 
the translator 19 initially determines if the range of the 
modification overlaps the initially defined range of an 
existing partition. If an overlap exists, the translator 

25 19 then determines whether the modification range actually 
overlaps the partition's active range of subject 
addresses. If the modification range does not overlap the 
active range, then the translator 19 can simply resize the 
partition's initially defined range without deleting the 

30 translation data structures within the partition. The 
reason is that, while the initially defined partition 
range represents the potential scope of the partition, as 
defined by the prior modification event which created the 
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partition, the active range represents the actual range of 
subject addresses translated. Modifications which do not 
overlap the active range therefore do not invalidate any 
of the translations within the partition, because the 
translator 19 knows that the translated subject addresses 
are distinct from, and therefore unaffected by, the 
modified subject addresses. By resizing the partition to 
a range which includes the active range, the translations 
stored within the partition can be kept in that partition 
and need not be deleted. On the other hand, if the 
modification range does overlap the partition's active 
range of addresses, then the translator 19 must delete the 
entire partition as described above. 

Referring now to Figure 10, an illustration of the 
.reducible partition optimization technique is provided. 
The partitions illustrated in Figure 10 are substantially 
the same as those shown in Figure 9, except that partition 
A includes a active range of subject addresses 106. Prior 
to the modification, four partitions A, B, C and D exist. 
The subject code then modifies a range of, subject code 
addresses extending from point 100 in memory to point 102. 
The range of the modification 104 intersects with the 
initially defined partition range for partition A, but the 
range of modification does not intersect with the active 
range 106 of partition A. After partition M is created 
for the modified range - 104, the range of partition A can 
be reduced without affecting the translations in partition 
A. Contrarily, it can be seen in this example that the 
range of modification 104 intersects with the active range 
108 of partition C. Thus, the range of partition C can 
not simply be reduced and partition C must be destroyed, 
whereupon new partition C is created for the remaining 
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portion of partition C which is not intersecting with the 
range of modification 104. Partition B is destroyed and 
its range is completely subsumed by the new partition M. 
Partition D remains unaffected. 

The translator 19 can detect most modification events 
at decode- time, but the actual subject addresses to be 
modified may not be known until the translated code 21 is 
executed. Thus, when the translator 19 detects during 
decoding either a subject code 17 modification or an 
indication that a modification has occurred, the 
translator 19 ends the translation of the current block, 
and inserts a notification immediately after the current 
block to notify the translator of the modification event. 

In one embodiment of the translator 19, the 
partitioning technique is implemented in the translator 19 
with the aid of special blocks. Special blocks are blocks 
which, while they may correspond to a particular subject 
address do not represent translations of subject code. In 
contrast., a translation block as described above 
represents a translation of a particular subject code 
sequence beginning with a particular starting subject 
address. Instead, special blocks contain special actions 
of the translator 19 which are inserted into the stream of 
target code during translation, before the blocks are 
actually executed. In effect, special blocks are 

lightweight translator actions, which can be planted at 
particular points in the target code control flow without 
requiring an expensive context switch out of target code 
back to the translator loop. 
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Special blocks contain pseudo- target code rather than 
normal target code. Normal target code represents a 
translation of some subject code sequence. Pseudo- target 
code consists of artificial (i.e., not translated) target 
5 code sequences, which is either written directly in target 
code or written in a high-level programming language 
(e.g., C++) and compiled to have the same calling 
conventions as target code generated by the translator. 
Examples of special blocks include notifyBlocks and border 
10 guard blocks, which are discussed herein. 

In one embodiment, the translator inserts a special 
block called a "notifyBlock" into the control flow of the 
translated program at a point immediately following the 
15 block containing the self -modifying subject code. The 
notifyBlock is a "special" translation structure, because 
unlike a translation block it does not represent the 
translation of any particular subject code, but rather it 
represents the modification event. The subject code which 
actually performs the modification is translated as part 
of the current block. The notifyBlock is inserted as the 
successor of the current block. Thus, immediately after 
the modification is performed in the translated current 
block, the subsequent notifyBlock notifies the translator 
to perform the appropriate actions in response to that 
modification (i.e., partition adjustments). The subject 
code 17 that follows the modification event is translated 
in a new block, which becomes the successor of the 
notifyBlock. 
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When the notifyBlock is reached in the target code 21 
at run-time, the translator 19 is notified of the subject 
address range of the modified code. The translator 19 
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uses this information to create a new partition, which may 
also alter or destroy existing partitions. 

The block containing the self -modifying subject code 
5 (the predecessor of the notifyBlock) ends immediately, 
after identifying a modification event because the 
modification may invalidate the current partition. The 
modified subject code might be the next subject 
instruction, which would require . the translator to 

10 translate and execute the newly modified version of that 
instruction rather than executing an existing translation 
of the old version. For. example, on the PowerPC 

architecture, a block must end after a cache flush 
instruction. Thus, when a modification event is detected, 

15 the current block of translation is ended and a 
notifyBlock is inserted as the successor. 

The notifyBlock also copies the current subject 
address and compatibility list from its predecessor block 

20 to its successor block. After the notification occurs, 
translation resumes in the successor block at the next 
subj ect instruction . If the current partition is 

destroyed as a result of the operation, the notifyBlock 
also handles the safe transition into the new partition 

25 which replaces it . 

Because translation blocks must end at partition 
boundaries, partitions can have a negative impact on the 
optimizations described above. Sections of code which 
30 would otherwise be translated together are translated as 
separate blocks due to the partitions formed, reducing the 
scope of optimization and increasing the number of returns 
to the translator loop. For example, the extended block 
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and group block optimizations are limited by partition 
boundaries. For this reason, one optimization that can be 
applied to partitioning in order to improve performance of 
the translator is, for subject programs that are known not 
to modify their own subject code, aggregating all subject 
memory into a single partition such that all subject code 
is contained in one partition. 

The scope of a partition must be properly defined 
because partition scope impacts the performance of the 
translator. If partitions are too small, control flow 
must constantly pass through border guards (described 
below) . If partitions are too large, a slight code 
modification will unnecessarily invalidate larger portions 
of translated code. A proper balance of such constraints 
should be considered when selecting partition size as it 
relates to the performance of the translator. 
Optimizations can be applied if it is determined that the 
particular partitions being generated have a negative 
impact on performance. By default, partition size is 
determined by the modification event detection mechanism. 
For example, if the translator 19 detects an mmap() of a 
run-time library, a new partition is created that 
encompasses the entire library. 

One optimization technique that can be applied to 
partitioning in order to improve performance of the 
translator 19 is the aggregation of partitions. For 
example, the PowerPC ICBI cache flush instruction 
invalidates one page of executable memory. However, if 
multiple pages of subject code 17 are modified at once, 
the subject program may contain several consecutive ICBI 
instructions which invalidate contiguous pages of memory. 
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Under the partitioning mechanism described above, this 
would result in multiple, contiguous, page -wide 
partitions. As such, one optimization of the partition 
mechanism is to detect consecutive cache flush 
5 instructions and coalesce the modified ranges into one 
"aggregated" partition . 

In one embodiment of the aggregation optimization, 
when a new partition is created, the translator 19 checks 

10 if there is an existing partition (i.e., not a remainder 
partition created as a byproduct of the new partition's 
creation) that is adjacent to and precedes the new 
partition in the subject code. If so, the range of the 
preceding existing partition can be expanded to include 

15 the range of the new partition, effectively aggregating 
the two partitions. 

The translator 19 may further aggregate those regions 
of subject code 17 which are known not to be modified, 

20 when such regions are contiguous. For example, in the 
situation where an entire subject program is known not to 
modify its own subject code, the translator 19 uses a 
single partition for all of the subject code 17. By 
further example, on the PowerPC architecture, all shared 

25 system libraries are located in the subject address range- 
0x90000000 - 0XA0000000 (the PowerPC "shared library 
region"). Most applications never modify the system 
libraries, such that the translator 19 may aggregate the 
entire PowerPC shared library region into one partition. 

30 

In one embodiment of the partitioning technique, the 
translator inserts an additional level of indirection for 
control flow that travels between partitions. The 
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translator 19 uses special placeholders, referred to 
herein as border guards, for blocks that touch partition 
boundaries, so that predecessor blocks can check if 
previously translated successor blocks still exist. With 
this approach, predecessor blocks can be efficiently 
notified when their successors are deleted. The 
translator 19 inserts a pair of border guard blocks at 
every point where the subject program's control flow 
crosses a partition boundary, otherwise referred to as 
border crossings. The pair of border guard blocks 

includes an exit border guard block and an entry border 
guard block. An exit border guard block is added after, 
and in the same partition as, the predecessor block. An 
entry border guard block is added before, and in the same 
partition as, the successor block. 

For translated blocks within the same partition, 
control flow passes directly from the predecessor block to 
the successor block. For blocks in different partitions 
(i.e., border crossings), control flows from, the 
predecessor block to an exit border guard, then to an 
-entry border guard, and then to the successor block. 

Entry border guards serve as placeholders for the 
translator 19, where entry border guards store references 
to their exit border guard counterparts. However, entry 
border guards perform minimal actions when "executed" 
(i.e., when control flow passes through them). The 
execution of an exit border guard verifies that a 
previously translated successor still exists. The data 
structure of each exit border guard also contains a 
reference to its entry border guard counterpart. Border 
guard blocks serve a bookkeeping function by providing an 
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explicit representation of partition crossings. When the 
translator 19 deletes a particular partition, it traverses 
every entry border guard within that partition to find the 
corresponding exit guards. As the partition is deleted, 
5 the counterpart reference of each such exit border guard 
is set to null. In some embodiments, the translator 19 
stores all of the entry border guards for a given 
partition together, such that they can be traversed 
efficiently when the partition is deleted. For example, 
10 in one embodiment, the translator 19 maintains an entry 
border guard list for each partition, wherein such list is 
updated whenever entry border guards are created or 
deleted . 

15 Accordingly, when an exit border guard is executed, it 

verifies that its successor still exists by simply 
checking its own counterpart reference. If the reference 
is defined, then a valid translated successor exists and 
control passes to the respective entry border guard (and 

20 therefore to the successor partition) . If the reference 
is undefined, then the exit border guard is unaware of a 
valid translated successor block, meaning that either 
there was one and it was deleted or this exit border guard 
has never been executed and therefore the successor has 

25 never been determined., If the counterpart reference is 
undefined., _ a. new successor block is obtained by either 
checking the basic block cache of the respective partition 
or translating the successor block. From the exit border 
guard's perspective, a successor block that was translated 

30 and subsequently deleted is indistinguishable from a 
successor block that has never been translated. 
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The explicit representation of partition border 
crossings achieved by the border guard pair makes the 
process of partition deletion more efficient. To delete a 
partition, the translator 19 must identify and nullify all 
references to deleted blocks (i.e., previously translated 
blocks within the deleted partition) . Absent some 

bookkeeping mechanism,. such as border guards, the 
translator 19 would need to traverse numerous translated 
blocks to identify which blocks had successors in the 
deleted partition, in order to remove the references to 
those discarded successors. In addition to making the 
process of deleting a partition more efficient, border 
guard blocks facilitate thread-safe border crossings, as 
discussed below. 

To delete a partition in the presence of border 
guards, the translator 19: (i) voids the "successor" links 
of all exit guards that point to the partition, (ii) 
notifies all successor partitions that their corresponding 
entry guards can be discarded, and (iii) deletes all 
translation structures and target code 21 belonging to the 
deleted partition. In the first step, the translator 19 
traverses all of the entry border guards in the partition, 
and resets their foreign exit border guard counterparts, 
effectively notifying all predecessor partitions that the 
deleted partition is void. in . the second step of 
partition deletion; all of the exit border guards in the 
partition are traversed to notify the successor partitions 
that their corresponding entry guards can be discarded. 
In one embodiment, the second step is performed by 
traversing each exit border guard of the deleted 
partition. In another embodiment, all of the entry guards 
in a partition are indexed by the predecessor partition 
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which contains the corresponding exit guard. In this 
case, the deleted partition need only notify each 
successor partition once, and each successor partition can 
then identify the corresponding entry points to be 
5 deleted. 

Multithreaded Programs in Partitioning 

Multithreaded subject programs present a difficulty to 
10 the translator 19 when utilizing partitioning, namely the 
deletion of partitions must be performed in a thread-safe 
manner. Specifically, while a given partition is being 
deleted, threads must be prevented from entering that 
partition. This guarantees that control always flows into 
15 valid translations, and never into a deleted or invalid 
block. After a partition has been deleted and recreated, 
threads may then be allowed to enter the (newly emptied) 
partition . 

20 In a preferred embodiment, the translator 19 uses a 

single global mutex ("the global partition lock" ) to 
serialize particular partition operations, including: (i) 
control flow jumps between partitions (i.e., border 
crossings) ; and (ii) deletion, and where applicable re- 

25 creation, of partitions. Thus, an exit border guard must 
acquire the global partition lock before control passes to 
its corresponding entry border guard. Likewise, a 

notifyBlock must acquire the global partition lock before 
destroying a partition. For modification events that 

30 require the deletion and subsequent re-creation of one or 
more partitions, all such deletions and re-creations are 
performed atomically under the protection of the global 
. partition lock, meaning that all such operations are . 



78 



performed in sequence without releasing the lock. The 
translator 19 maintains a partition identifier for each 
thread, which is changed when control passes between 
partitions. This change is made by a border guard pair 
5 while holding the global partition lock. 

Referring now to FIG. 11, the control flow of a 
translated program across partition boundaries is 
illustrated, both with and without crossing a partition 

10 boundary. The top half of FIG. II shows the control flow 
of a particular subject program in the translator 19 when 
partitioning is not being employed or when control flow 
remains within a single partition. Control passes from 
subject block A 201 to subject block B 203, then to 

15 subject block C 205, then to subject block D 207. In 
terms of the program subject code, block B 2 03 is block 
A' s 201 successor, block C 205 is block B's successor, and 
block D 207 is block C's successor. 

20 Alternatively, assuming a partition boundary 209 

existed between subject bock B and subject block C, 
control would necessarily flow through a border guard pair 
when crossing the partition boundary 2 09, as illustrated 
in the lower portion of FIG. 11. Blocks A 201 and B 203 

25 are both in partition 215, while blocks C 205 and D 207 
are both in partition 217. In the translated subject 
program, control passes directly from block A 201 to block 
B 203, because blocks A 201 and B 203 are in the same 
partition 215. The transfer of control from block B 203 

30 to block C 205, however, crosses a partition boundary. As 
such, control passes from block B 203 to an exit border 
guard block 211. As described above, the exit border 
guard 211 checks that its corresponding entry border guard 
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213 has not been deleted. In a multithreaded program 
(discussed below) , the exit border guard also acquires the 
global partition lock, so that the partition boundary can 
be crossed in a thread-safe manner. Control then passes 
5 from the exit border guard block 211 to its corresponding 
entry border guard block 213. Execution has now passed 
into partition 217. In a multithreaded program (discussed 
below) , the entry border guard changes the partition 
identifier of the current thread to reflect the fact that 
10 the thread is now in partition 217 and the global 
partition lock is then released. Once inside partition 
217, control passes from the entry border guard 213 to 
block C 205, and then directly from block C 205 to block D 
207 . 

15 

Memory Management 

Memory management is critical consideration in the 
various embodiments of the dynamic binary translator 19. 

2 0 The memory demands of translating a program from one 

architecture to another are high. The memory demands 
become even higher when optimizations such as isoblocks, 
extended blocks, and group blocks are introduced, as each 
of these optimizations creates the possibility that one 
25 sequence of subject code 17 may be represented by multiple 
translations. A modification event requires deletion of a 
partition, and therefore the deletion of all translation 
data (i.e., block structures and target code) therein. 

3 0 In one embodiment, the translator 19 provides its own 

memory management subsystem which mirrors the subject code 
partitions. Partitions are intended to group all 

translator data structures by the subject code 17 regions 
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to which they correspond. Portions of subject code 17 
that are likely to be invalidated together form a 
partition: all related translator data structures 
therefore reside in the same well-defined region of target 
5 memory. If a partition is invalidated, all of the 
translation data (translator structures and target code) 
associated with that partition can be freed en masse, 
avoiding the need to free every structure individually. 

10 In this specific embodiment, the translator 19 

performs all memory allocation through memorySource 
objects. Each partition has one corresponding 

memorySource. MemorySources obtain memory from the 

operating system through conventional means such as mmap() 

15 system calls, but they do so in bulk. Other translator 
code (including pseudo- target code) obtains memory as 
needed (i.e., in smaller quantities), but from per- 
partition memorySources rather than directly from the 
operating system. This improves the performance of the 

2 0 translator 19 by reducing the number and frequency of 
underlying memory- related system calls. MemorySources 
also provide a function for flushing the entire contents 
of the memorySource at once . The memorySource can 
implement a flush by actually freeing the underlying 

25 memory or by simply discarding all of the "allocations" it 
has * made from that memory (i.e., wiping the slate clean 
while retaining the underlying memory) . 

The memory subsystem of the translator 19 also 
30 simplifies the process of deleting a partition. To delete 
a partition, the translator 19 must: (i) void the 
"successor" links of all predecessor partitions' exit 
guards; (ii) notify all successor partitions that their 
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corresponding entry guards can be discarded; and (iii) 
delete all translation structures and target code 
belonging to the partition. Without the memory subsystem, 
the third step requires that the translator walk the per- 
5 partition collections of translation structures and target 
code to free each structure individually. With the memory 
subsystem, all of the partition's structures can be freed 
at once by simply flushing the memorySource . 

10 Shared Cache 

In another preferred embodiment, the translator 19 
includes a shared code cache, which allows the target code 
21 and translation structures corresponding to a 
15 particular subject program to be shared between different 
executions or runs of -the translator 19. The shared code 
cache is facilitated by a dedicated code cache server 
process, which interacts with translators 19 at the 
beginning and end of their executions. 

20 

In one embodiment, a code cache consists of one or 
more memorySource objects and the regions of memory that 
they own (i.e., partitions), which are stored by the cache 
server as files. For example, in one embodiment of 
25 PowerPC translators (which translate from the PowerPC 
subject architecture to some target architecture), the 
partition representing the PowerPC shared library region 
(subject addresses 0x90000000 - 0XA0000000) is cached as 
one file in the shared code cache. 

30 

Each code cache contains target code and translation 
structures specific to both a particular subject program 
and to a particular compilation or build of the translator 
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itself (data structures from different translator builds 
may be binary- incompatible) . To verify that an existing 
code cache is compatible with a particular run of the 
translator, various metrics are compared; any metric which 
yields non- identical values indicates binary 

incompatibility and renders the code cache incompatible. 
The code cache binary compatibility metrics include, the 
date and t ime that the translator was built . 

When translation of a subject program begins, the 
translator 19 checks the code cache server for a 
compatible code cache. If a compatible cache is found, 
the translator 19 loads the cache, which consists of 
target code 21 and translation structures. A cache 
potentially contains all of the translated code 21 created 
.over the course of a translated execution of the subject 
program, including optimized target code such as group 
blocks. This allows a later translator execution to 
Piggyback on the efforts of earlier executions; large 
sections of subject code may have already been translated 
(reducing startup time and translation cost) and possibly 
optimized. 

The translator 19 loads a code cache file by mapping 
it into memory as a shared object, similar to how a shared 
library is loaded. The cache file is preferably shared 
using a copy-on-write policy. Under a copy-on-write 
policy, the cache file is initially shared across all 
running translator processes. Copy-on-write means that 
when a particular translator execution modifies -the cached 
structures in any way (e.g., incrementing a block's 
execution count) the cache becomes exclusive to that 
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particular execution and the memory region is no longer 
shared across multiple processes. 

When a translator execution completes (i.e., the 
5 translated program terminates) , the cache server compares 
that execution's code cache to the cache stored by the 
server. If the current execution's code cache is better 
(i.e., has more block translations) than the stored 
version, the server stores the translator's cache for 
10 future use. As such, the quality of the code caches 

stored in the server improve over time. 

Although a few preferred embodiments have been shown 
and described, it will be appreciated by those skilled in 
15 the art that various changes and modifications might be 
made without departing from the scope of the invention, as 
defined in the appended claims. 

Attention is directed to all papers and documents 
20 which are filed concurrently with or previous to this 
specification in connection with this application and 
which are open to . public inspection with this 
specification, and the contents of all such papers and 
documents are incorporated herein by reference. 

25 

All of the features disclosed in this specification 
(including any accompanying claims, abstract ,and 
drawings) , and/or all of the steps of any method or 
process so disclosed, may be combined in any combination, 
30 except combinations where at least some of such features 
and/or steps are mutually exclusive. 
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Each feature disclosed in this specification 
(including any accompanying claims, abstract and drawings) 
may be replaced by alternative features serving the same, 
equivalent or similar purpose, unless expressly stated 
otherwise. Thus, unless expressly stated otherwise, each 
feature disclosed is one example only of a generic series 
of equivalent or similar features. 

The invention is not restricted to the details of the 

foregoing embodiment (s) . The invention extends to any 
novel one, or any novel combination, of the features 

disclosed in this specification (including any 

accompanying claims, abstract and drawings) , or to any 

novel one, or any novel combination, of the steps of any 
method or process so disclosed. 
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Claims 

1. A method of partitioning code during a translation 
of subject code into translated target code to account for 

5 self -modifying subject code, comprising: 

* identifying self -modifying code events in said subject 

code ; and 

10 dividing a region of memory containing said subject 

code into at least one partition when identifying a self- 
modifying code event, wherein each partition includes a 
range of subject code addresses in said memory which are 
affected by a respective self -modifying code event. 

15 

2. The method of claim 1, wherein each partition 
further includes translated target code corresponding to 
subject code contained in that partition. 

20 3. The method of claim 1 or 2 , wherein each said 

partition represents a region of memory that does not 
overlap with regions of memory corresponding to other 
partitions . 

25 4. The method of any preceding claim, wherein a self- 

modifying code event modifies a respective range of 
subject code addresses, said method further comprising: 

modifying partitions existing in said memory that 
30 possess subject code addresses which are affected by said 
self -modifying code event. 
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5. The method of claim 4, wherein said partition 

modifying step comprises: 

creating a new partition to include modified subject 
5 code corresponding to the self -modifying • code event; and 

for existing' partitions having ranges of subject code 
addresses which overlap with the subject code addresses of 
the newly created partition, modifying said existing 
10 partitions to delete the subject code addresses from said 
existing partitions that overlap with the subject code 
addresses of the newly created partition such that the 
partitions no longer overlap. 

15 6. The method of claim 5, wherein each partition 

further includes translated target code corresponding to 
subject code contained in that partition, said method 
further comprising: 

20 deleting translated target v code associated with 

partitions that have been modified in response to the 
self -modifying code event; and 

translating new target code for the subject code 
25 contained in the modified partitions. 

7. The method of claim 5 or 6, further comprising 

adding translated target code to a partition as 

corresponding subject code in that partition is 
30 translated. 



8. The method of claim 7, wherein each partition 

includes a particular range of subject code addresses that 
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have been translated, such that the particular range of 
subject code addresses having been translated comprises an 
active range within the partition, said method further 
comprising: 

determining, whether the subject code addresses of said 
newly created partition overlap with any subject code 
addresses in said active range of any partition; and 

for existing partitions having an active range that 
overlaps with the subject code addresses of said newly 
created partition, deleting translated target code 
associated with partitions that have been modified in 
response to the self -modifying code event, and translating 
new target code for the subject code contained in the 
modified partitions. 

9. The method of claim 8, wherein each partition 

includes a range of subject code addresses that have not 
been translated referred to as an Inactive range within 
the partition,' said method further comprising: 

for existing partitions having an active range which 
does not overlap with the subject code addresses of said 
newly created partition but having an inactive range that 
does overlap with the subject code addresses of said newly 
created partition, modifying said existing partitions to 
delete the subject code addresses from said inactive 
ranges in said existing partitions that overlap with the 
subject code addresses of the newly created partition ' such 
that the partitions no longer overlap, and leaving the 
translated target code associated with active ranges in 
said existing partitions unchanged. 
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10. The method of any of claims 4 to 9, further 

comprising : 

identifying partitions that are adjacent to one 
another in memory having characteristics that allow them 
to be combined; and 

aggregating said adjacent partitions into a single, 
combined partition. 

11. The method of any preceding claim, wherein said 
self-modifying code event is identified during decoding of 
the subject code, said method further comprising inserting 
a special translation structure into a control flow of the 
translated target code as a representation of the 
identified self -modifying code event. 

12. The method of claim 11, in response to 
encountering said special translation structure during 
execution of the translated target code, said method 
further comprising: 

identifying the range of subject code addresses 
affected by the self -modifying code event, and creating 
the partition in memory using this identified range of 
subject code addresses. 

13. The method of any preceding claim, wherein each 

partition includes a pair of border guards which 
facilitate control flow passing between partitions, 
wherein said pair of border guards includes an entry 
border guard and an exit border guard, such that each exit 
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border guard contains a specific reference to a 
counterpart entry border guard in a succeeding partition 
to be executed next . 

14. The method of claim 13, when encountering an exit 
border guard during execution of a current partition, said 
method further comprising verifying that a counterpart 
entry border guard exists in a successive partition before 
passing control from the current partition to the 
successive partition. 

15. The method of claim 13 or 14, wherein a set of 
border guards exists containing entry border guards and 
exit border guards for all partitions, said method further 
comprising modifying said set of border guards whenever a 
new partition is created in response a self -modifying code 
event . 



16. The method of any of claims 4 to 15, wherein when 
subject code defines a multi -threaded program, said method 
further comprising preventing other threads from entering 
a partition while the partition is being modified by 
another thread. 

17. The method of any of claims 4 to 16, wherein each 
partition further includes translated target code 
corresponding to subject code contained in that partition, 
wherein each partition includes a pair of border guards 
which facilitate control flow passing between partitions, 
wherein said pair of border guards includes an entry 
border guard and an exit border guard, such that each exit 
border guard contains a specific reference to a 
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counterpart entry border guard in a succeeding partition 
to be executed next, said method further comprising: 

providing a memory management subsystem having regions 
5 which mirror the subject code partitions, wherein said 
memory management subsystem stores target code and border 
guard pairs associated with a partition along with its 
corresponding subject code; and 

10 deleting an entire region of said memory management 

subsystem that corresponds to a, specific partition 
whenever that specific partition is modified. 

18. A computer readable storage medium having 
15 translator software resident thereon in the form of 

computer readable code executable by a computer for 
performing a method of partitioning code during a 
translation of subject code into translated target code to 
account for self -modifying subject code, said method 
20 comprising: 

identifying self -modifying code events in said subject 
code ; and 

25 dividing a region of memory containing said subject 

code into at least one partition when identifying a self- 
modifying code event, wherein each partition includes a 
range of subject code . addresses in said memory which are 
affected by a respective self -modifying code event. 

30 

19. The computer- readable storage medium of claim 18, 
wherein each partition further includes translated target 
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code corresponding to subject code contained in that 
partition . 

20. The computer-readable storage medium of claim 18 
5 or 19, wherein each said partition represents a region of 

memory that does not overlap with regions of memory 
corresponding to other partitions. 

21. The computer-readable storage medium of claim 18, 
10 19 or 20, wherein a sel f -modifying code event modifies a 

respective range of subject code addresses, said method 
further comprising: 

modifying partitions existing in said memory that 
15 possess subject code addresses which are affected by said 
self -modifying code event. 

22. The computer-readable storage medium of claim 21, 
wherein said partition modifying step comprises: 

20 

creating a new partition to include modified subject 
code corresponding to the self -modifying code event; and 

for existing partitions having ranges of subject code 
25 addresses which overlap with the subject. code addresses of 
the newly created partition, modifying said existing 
partitions to delete the subject code addresses from said 
existing partitions that overlap with the subject code 
addresses of the newly created partition such that the 
30 partitions no longer overlap. 



23. The computer-readable storage medium of claim 22, 

wherein each partition further includes translated target 
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code corresponding to subject code contained in that 
partition, said method further comprising: 

deleting translated target code associated with 
5 partitions that have been modified in response to the 
self -modifying code event; and 

translating new target code for the subject code 
contained in the modified partitions. 
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24 . The computer -readable storage medium of claim 22 

or 23, said method further comprising adding translated 
target code to a partition as corresponding subject code 
in that partition is translated. 

25. The computer-readable storage medium of any of 

claims 22 to 24, wherein each partition includes a 
particular range of subject code addresses that have been 
translated, such that the particular range of subject code 
addresses having been translated comprises an active range 
within the partition, said method further comprising: 

determining whether the subject code addresses of said 
newly created partition overlap with any subject code 
addresses in said active range of any partition; and 

for existing partitions having an active range that 
overlaps with the subject code addresses of said newly 
created partition, deleting translated target code 
associated with partitions that have been modified in 
response to the self -modifying code event, and translating 
new target code for the subject code contained in the 
modified partitions. 
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26. The computer-readable storage medium of claim 25, 

wherein each partition includes a range of subject code 
addresses that have not been translated referred to as an 
inactive range within the partition, said method further 
comprising : 

for existing partitions having an active range which 
does not overlap with the subject code addresses of said 
newly created partition but having' -an inactive range that 
does overlap with the subject code addresses of said newly 
created partition, modifying said existing partitions to 
delete the subject code addresses from said inactive 
ranges in said existing partitions that overlap with the 
subject code addresses of the newly created partition such 
that the partitions no longer overlap, and leaving the 
translated target code associated with active ranges in 
said existing partitions unchanged. 

27. The computer- readable storage medium of any of 
claims 21 to 26, further comprising: 

identifying part it ions that are adjacent to one 
another in memory having characteristics that allow them 
to be combined; and 

aggregating said adjacent partitions into a single, 
combined partition. 

28. The computer-readable storage medium of any of 
claims 18 to 27, wherein said self -modifying code event is 
identified during decoding of the subject code, said 
method further comprising inserting a special translation 
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structure into a control flow of the translated target 
code as a representation of the identified self -modifying 
code event . 

5 29. The computer -readable storage medium of claim 28, 

in response to encountering said special translation 
structure during execution of the translated target code, 
said method further comprising: 

10 identifying the range of subject code addresses 

affected by the self -modifying code event, and creating 
the partition in memory using this identified range of 
subj ect code addresses . 

15 30. The computer-readable storage medium of any of 

claims 18 to 29, wherein each partition includes a pair of 
border guards which facilitate control flow passing 
between partitions, wherein said pair of border guards 
includes an entry border guard and an exit border guard, 

20 such that each exit border guard contains a specific, 
reference to a counterpart entry border guard in a 
succeeding partition to be executed next. 

31. The computer-readable storage medium of claim 30, 
25 when encountering an exit border guard during execution of 

a current partition, said method further comprising 
verifying that a counterpart entry border guard exists in 
a successive partition before passing control from the 
current partition to the successive partition. 

30 

32. The computer-readable storage medium of claim 30 
or 31, wherein a set of border guards exists containing 
entry border guards and exit border guards for all 
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partitions, said method further comprising modifying said 
set of border guards whenever a new partition is created 
in response a self -modifying code event. 

5 33. The computer- readable storage medium of any of 

claims 21 to 32, wherein when subject code defines a 
mult i -threaded program, said method further comprising 
preventing other threads from entering a partition while 
the partition is being modified by another thread. 

10 

34. The. computer- readable storage medium of any of 

claims 21 to 33 wherein each partition further includes 
translated target code corresponding to subject code 
contained in that partition, wherein each partition 

15 includes a pair of border guards which facilitate control 
flow passing between partitions, wherein said pair of 
border guards includes an entry border guard and an exit- 
border guard, such that each exit border guard contains a 
specific reference to a counterpart entry border guard in 

20 a succeeding partition to be executed next, said method 
further comprising: 

providing a memory management subsystem having regions 
which mirror the subject code partitions, wherein said 
25 memory management subsystem stores target code and border 
guard pairs associated with, a partition along with its 
corresponding • sub j ect code ; and 

deleting an entire . region of. said memory management 
30 subsystem that corresponds to a specific partition 
whenever that specific partition is modified. 
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35. A translator apparatus for use in a target 
computing environment having a processor and a memory- 
coupled to the processor for partitioning code during a 
translation of subject code into translated target code to 
account for self -modifying subject code, comprising: 

a self -modifying code identifying mechanism configured 
for identifying self -modifying code events in said subject 
code ; and 

a partitioning mechanism configured for dividing a 
region of memory containing said subject code into at 
least one partition when identifying a self -modifying code 
event, wherein each partition includes a range of subject 
code addresses in said memory which are affected by a 
respective self -modifying code event. 

36. The translator apparatus of claim 35, wherein each 
partition further includes translated target code 
corresponding to subject code contained in that partition. 

37. The translator apparatus of claim 36, wherein each 
said partition represents a region of memory that does not 
overlap with regions of memory corresponding to other 
partitions . 

38. The translator apparatus of claim 35, 36 or 37, 
wherein a self -modifying code event modifies a respective 
range of subject code addresses, said translator apparatus 
further comprising: 

a partition modifying mechanism configured ' for 
modifying- partitions existing in said memory that possess 
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subject code addresses .which are affected by said self- 
modifying code event. 

39. The translator apparatus of claim 38., wherein said 
5 partition modifying mechanism further: 

creating a new partition to include modified subject 
code corresponding to the self -modifying code event; and 

10 for existing partitions having ranges of subject code 

addresses which overlap with the subject code addresses of 
the newly created partition, modifying said existing 
partitions to delete the subject code addresses from said 
existing partitions that overlap with the subject code 

15 addresses of the newly created partition such that the 
partitions no longer overlap. 

40. The translator apparatus of claim 39, wherein each 
partition further includes translated target code 

20 corresponding to subject code contained in that partition, 
said translator apparatus further comprising: 

a target code deletion mechanism configured for 
deleting translated target code associated with partitions 
25 that have been modified in response to the self -modifying 
code event ; and 

a target code retranslation mechanism configured for 
translating new target code for the subject code contained 
30 in the modified partitions. 



41. The translator apparatus of claim 39 or 40, 

further comprising a target code addition mechanism for 
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adding translated target code to a partition as 
corresponding subject code in that partition is 
translated. 



5 42. The translator apparatus of claim 41, wherein each 

partition includes a particular range of subject code 
addresses that have been translated, such that the 
particular range of subject code addresses having been 
translated comprises an active range within the partition, 
said translator apparatus further comprising: 

an active range detection mechanism configured for 
determining whether the subject code addresses -of said 
newly created partition overlap with any subject code 
addresses in said active range of any partition; and 

for existing partitions having an active range that 
overlaps with the subject code addresses of said newly 
created partition, said partition modifying mechanism 
further configured for deleting translated target code 
associated with partitions that have been modified in 
response to the self -modifying code event, and translating 
new target code for the subject code contained in the 
modified partitions. 

43. The translator apparatus of claim 42, wherein each 

partition includes a range of subject code addresses that 
have not been translated referred to as an inactive range 
Within the partition, said partition modifying mechanism 
further configured for: 

for existing partitions having an active range which 
does not overlap with the subject code addresses of said 
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newly created partition but having an inactive range that 
does overlap with the subject code addresses of said newly 
created partition, modifying said existing partitions to 
delete the subject code addresses from said inactive 
5 ranges in said existing partitions that overlap with the 
subject code addresses of the newly created partition such 
that the partitions no longer overlap, and leaving the 
translated target code associated with active ranges in 
said existing partitions unchanged. 

10 

44. The translator apparatus of any of claims 38 to 

43, further comprising: 

an adjacent partition identifying mechanism configured 
15 for identifying partitions that are adjacent to one 
another in memory having characteristics that allow them 
to be combined; and 

a partition aggregation mechanism configured for 
20 aggregating said adjacent partitions into a single, 
combined partition. 

45. The translator apparatus of any of claims 35 to 

44, wherein said self -modifying code event is identified 
25 . during decoding of the subject code, said translator 

apparatus further comprising a notifying mechanism for 
inserting a special translation structure ' into a control 
flow of the translated target code as a representation of 
the identified self -modifying code event. 

30 

46. The translator apparatus of claim 45, in response 
to encountering said special translation structure during 
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execution of the translated target code, said translator 
apparatus configured for: 

identifying the range of subject code addresses 

5 affected by the self-modifying code event, and creating 

the partition in memory using this identified range of 
subject code addresses. 
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47. The translator apparatus of claim 45 or 46, 

wherein each partition includes a pair of border guards 
which facilitate control flow passing between partitions, 
wherein said pair of border guards includes an entry 
border guard and an exit border guard, such that each exit 
border guard contains a specific reference to a 
counterpart entry border guard in a succeeding partition 
to be executed next . 

48. The translator apparatus of claim 47, when 

encountering an exit border guard during execution of a 
current partition, said translator apparatus further 
configured for verifying that a counterpart entry border 
guard exists in a successive partition before passing 
control from the current partition to the successive 
partition. 

49- The translator apparatus of claim 47 or 48, 

wherein a set of border guards exists containing entry 
border guards and exit border guards for all partitions, 
said translator apparatus further configured for modifying 
said set of border guards whenever a new partition is 
created in response a self -modifying code event. 



101 



50. The translator apparatus of any of claims 38 to 

49, wherein when subject code defines a multi -threaded 
program, said translator apparatus further configured for 
preventing other threads from entering a partition while 

5 the partition is being modified by another thread. 

51. The translator apparatus of any of claims 38 to 

50, wherein each partition further includes translated 
target code corresponding to subject code contained in 
that partition, wherein each partition includes a pair of 
border guards which facilitate control flow passing 
between partitions, wherein said pair of border guards 
includes an entry border guard and an exit border guard, 
such that each exit border guard contains a specific 

15 reference to a counterpart entry border guard in a 
succeeding partition to be executed next, said translator 
apparatus further configured for: 
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20 



providing a memory management subsystem having regions 
which mirror the subject code partitions, wherein said 
memory management subsystem stores target code and border 
guard pairs associated with a partition along with its 
corresponding subject code; and 

25 deleting an entire region of said memory management 

subsystem that corresponds to a specific partition 
whenever that specific partition is modified. 
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ABSTRACT 



10 



METHOD AND APPARATUS FOR PARTITIONING 
CODE IN PROGRAM CODE CONVERSION 



A partitioning technique utilized by a translator to 
divide the subject code space into regions, referred to as 
partitions, where each partition contains a distinct set 
of basic blocks of subject code and corresponding target 
code. The partitioning technique divides the translator's 
representation of subject code and subject code 
translations into non- overlapping regions of subject 
15 memory. m this manner, when the subject program modifies 
subject code, only those partitions actually affected by 
the self-modifying code need • be discarded and all 
translations in unaffected partitions can be kept. This 
partitioning technique is advantageous in limiting the " 
20 amount of target code that must be retranslated in 
response to self -modifying code operation. m another 
process, the partitioning technique allows multithreaded 
subject programs that also involve self -modifying code to 
perform code modification in a thread-safe manner. 
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[Figure 11] 
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